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BLEOMYCIN GENE CLUSTER COMPONENTS AND THEIR USES 

CROSS-REFERENCE TO RELATED APPLICATIONS 

This application claims benefit under 35 U.S.C. § 1 19 of provisional 
applications USSN 60/1 15,435, filed on January 6, 1999, and USSN 60/1 18,848, filed on 
5 February 5, 1999, both of which are herein incorporated by reference in their entirety for all 

purposes. 

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY 
SPONSORED RESEARCH AND DEVELOPMENT 
This work was supported in part by an Institutional Research Grant from the 
10 American Cancer Society and the School of Medicien, University of California, Davis, 
National Institutes of Health Grant Number A140475, and a grant from the Searle Scholars 
Program of the Chicago Community Trust. The Government of the United States of 
America may have certain rights in this invention. 

FIELD OF THE INVENTION 
This invention relates the field of polyketide synthesis and nonribosomal 
polypeptide synthesis. In particular this invention pertains to the isolation of the bleomycin 
gene cluster which encodes the first identified hybrid polyketide synthase/nonnbosomal 
peptide synthetase pathway. 

BACKGROUND OF THE INVENTION 
Polyketides and nonribosomal peptides are two large families of natural 
products that include many clinically valuable drugs, such as erythromycin and vancomycin 
(antibacterial), FK506 and cyclosporin (immunosuppresant), and epothilone and bleomycm 
(BLM) (antitumor). The biosyntheses of polyketides and nonribosomal peptides are 
catalyzed by polyketide synthases (PKSs) (Hopwood (1997) Chem. Re, 97: 2465; Katz 
25 (1997) Chem. Re,, 97: 2557; C. Khosla, (1997) Chem. Re,. 97: 2577; Ikeda and Omura, 
(1997) Chem. Re,. 97: 2591; Staunton and Wilkinson(1997) Chem. Re,. 97: 261 1; Cane et 
al (1998) Science 282: 63) and nonribosomal peptide synthetases (NRPSs) (Cane et 
al (1998) Science 282: 63. Marahiel et al. (1997) Chem. Re, 97: 2651 ; von D6hren et al. 
(1997) Chem. Re, 97: 2675), respectively. Remarkably, PKSs and NRPSs use a very 
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similar strategy for the assembly of these two distinct classes of natural products by 
sequential condensation of short carboxylic acids and amino acids, respectively, and utilize 
the same 4'-phosphopantetheine prosthetic group, via a thioester linkage, to channel the 
growing polyketide or peptide intermediate during the elongation processes. 

Both type I PKSs and NRPSs are multifunctional proteins that are organized 
into modules. (A module is defined as a set of distinctive domains that encode all the 
enzyme activities necessary for one cycle of polyketide or peptide chain elongation and 
associated modifications.) The number and order of modules and the type of domains within 
a module on each PKS or NRPS protein determine the structural variations of the resulting 
polyketide and peptide products by dictating the number, order, choice of the carboxylic acid 
or amino acid to be incorporated, and the modifications associated with a particular cycle of 
elongation. These features of PKS and NRPS inspired us to search for a hybrid PKS and 
NRPS system. Since the modular architecture of both PKS (Cane et a/.(1998) Science 282: 
63; Katz and Danadio (1993) Am. Rev. Microbiol. 47: 875 (1993); Hutchinson and Fujii 
(1995) Ann. Rev. Microbiol. 49: 201) and NRPS (Cane et a/.(1998) Science 282: 63, 
Stachelhaus et al. (1995) Science 269: 69; Stachelhaus et al. (198) Mol. Gen. Genet. 257: 
308; Belshaw et al. (1999) Science 284, 486) has been exploited successfully in 
combinatorial biosynthesis of diverse "unnatural" natural products, it is imagined that a 
hybrid PKS and NRPS system, capable of incorporating both carboxylic acids and amino 
acids into the final products, could surely lead to even greater chemical structural diversity. 

The BLMs, differing structurally at the C-terminal amines of the 
glycopeptides, are a family of antibiotics produced by Streptomyces verticillus (Sv), BLMs 
exhibit strong antitumor activity through a metal-dependent oxidative cleavage of DNA or 
RNA in the presence of molecular oxygen and are incorporated into current chemotherapy of 
several malignancies under the trade name of Blenoxane® that contains BLM A2 and BLM 
B2 as the principal constituents (Sikic et al. Eds. (1985) Bleomycin Chemotherapy, 
Academic Press, New York; Natrajan and Hecht (1994) pages 197-242 In: Molecular 
Aspects of Anticancer Drug-DNA Interaction Vol. 2, Neidle and Waring Eds., Macmillan, 
London). Umezawa, Fujii, Takita, and co-workers extensively studied the biosynthesis of 
BLM in Sv ATCC15003 by feeding isotope-labeled precursors and by isolating various 
biosynthetic intermediates and shunt metabolites, establishing that the BLMs are in fact 
natural hybrid metabolites of polyketide and peptide biosynthesis (Takita and Muroka (1990) 
pages 289-309 In: Biochemistry of Peptide Antibiotics: Recent Advances in the 
Biotechnology ofp-Lactams and Microbial Peptides, Kleinkauf and Von Dohren Eds., W. de 
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Gruyter, New York). On the assumption that BLM biosynthesis follows the paradigm for 
peptide and polyketide biosynthesis, we predict that the Blm megasynthetase, which 
catalyzes the assembly of the BLM backbone from nine amino acids and one acetate, should 
bear the characteristics of both NRPS and PKS, providing an excellent model to study the 
5 mechanism by which NRPS and PKS could be integrated into a productive biosynthetic 

system to synthesize a hybrid peptide and polyketide metabolite (Fig. 1 A) (Shen et al. (1999) 
Bioorg. Chem. 27: 155). 

SUMMARY OF THE INVENTION 

This invention pertains to the isolation and elucidation of the bleomycin gene 
10 cluster. Nucleic acid sequences encoding all of the open reading frames (ORFs) that encode 
polypeptides sufficient to direct the biosynthesis of bleomycin are provided. The nucleic 
acids can be used in their "native" format or recombined in a wide variety of manners to 

create novel synthetic pathways. 

In one embodiment, this invention provides an isolated nucleic acid 

1 5 comprising a nucleic acid selected from the group consisting of a nucleic acid encoding any 
one of Blm open reading frames (ORFs) 8 through 41, and/or a nucleic acid encoding a 
polypeptide encoded by any one of Blm open reading frames (ORPs) 8 through 41, and/or a 
nucleic acid amplified by polymerase chain reaction (PCR) using any one of the primer pairs 
identified in Table II and the nucleic acid of a bleomycin-producing organism as a template. 

20 The nucleic acid may comprise one or multiple (e.g. two, more preferably 3 or more) 

bleomycin open reading frames (i.e. BLM ORFs 8 through 41). One preferred nucleic acid 
comprises a nucleic acid encoding a C domain lacking one or more His residues of the 
conserved HHxxxDG active site for transpeptidation. In another preferred embodiment the 
nucleic acid comprises a nucleic acid encoding a protein encoded by a gene selected from the 

25 group consisting of blml, blmll, and blmXI. 

In another embodiment this invention provides an isolated nucleic acid 
encoding a (biosynthetic) module comprising two or more (more preferably three or more, 
most preferably four or more) catalytic domains of a protein encoded by a nucleic acid of a 
bleomycin gene cluster wherein said catalytic domains are selected from the group consisting 

30 of a condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) 
domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, 
an oxidization domain (Ox), a ketoacyl synthase (KS) domain , an acetyl transferase (AT) 
domain, a ketoreductase (KR) domain, and a methyltransferase (MT) domain. Preferred 

3 



WO 00/40704 PCT/USOO/00445 
nucleic acids comprises a nucleic acid encoding one or more proteins comprising a module 
selected from the group consisting of NRPS-0, NRPS-1, NRPS-2, NRPS-3, NRPS-4, NRPS- 
5 NRPS-6, NRPS-7, NRPS-7, NRPS-9, and PKS. Particularly preferred nucleic acids 
comprise an open reading frame from SEQ ID NO: 1, SEQ ID NO: 2, or SEQ ID NO: 3. 
5 In still another embodiment, this invention provides an isolated nucleic acid 

comprising a nucleic acid encoding a protein encoded by a gene from a BLM gene cluster. 
Preferred nucleic acids encode a protein encoded by a gene selected from the group 
consisting of blml, blmll, and blmXI. In another embodiment, preferred nucleic acids 
encode a protein encoded by a gene selected from the group consisting of blmlll, blmlV, 
10 blmV, blmVI, blmVII, blmlX, and blmX. In still yet another embodiment, the nucleic acid 
comprises a nucleic acid encoding a protein encoded by blmVIII. Particularly preferred 
nucleic acids comprise a nucleic acid selected from the group consisting of blml, blmll, and 
blmXI. Other particularly preferred nucleic acids comprise a nucleic acid selected from the 
group consisting of blmlll, blmlV, blmV, blmVI, blmVII, blmlX, and blmX, while still other 
1 5 particularly preferred nucleic acids comprise blmVIII. 

In still yet another embodiment, this invention provides an isolated nucleic 
acid comprising a nucleic acid that encodes a protein comprising at least one catalytic 
domain selected from the group consisting of a condensation (C) domain, an adenylation (A) 
domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an 
20 acyl-carrier protein (ACP)-like domain, an oxidization domain (Ox), a ketoacyl synthase 
(KS) domain , an acetyl transferase (AT) domain, a ketoreductase (KR) domain, and a 
methyltransferase (MT) domain, and that hybridizes to a nucleic acid selected from the group 
consisting of orf8, orf9, orflO, orfll, orfl2, orfl3, orfl4, orflS, orflS, orfl6, orfl7, orflS, 
orfl9, orf20, orf21, orf22, orf23, orf24, orf25, orf26, orf27, orf28, orf29, orfiO, orGl, orf32, 
25 orB3, orf34, orf35, orf36, orf37, orf38, orf39, and orf40 under stringent conditions. In 
certain embodiments this also includes nucleic acids that would stringently hybridizes 
indicated above, but for, the degeneracy of the nucleic acid code. In other words, if silent 
mutations could be made in the subject sequence so that it hybridizes to he indicated 
sequence® under stringent conditions, it would be included in certain embodiments. A 
30 preferred isolated nucleic acid comprises a nucleic acid encoding a module. A particularly 
preferred isolated nucleic acid comprises a nucleic acid encoding a BLM gene. 

This invention also provides a nucleic acid comprising a nucleic acid selected 
from the group consisting of consisting of orf8, orf9, orflO, orfll. orfl2, orfl3, orfl4, orflS, 
orflS. orfl6, orfl7, orflS, orfl9, orf20, orf21, orf22, orf23, orf24, orf25, orf26, orf27, orf28, 
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orf29, orDO, orB 1, orf32, orf33, orf34, orf35, orf36, orB7, orf38, orf39, and orf40, or an 
allelic variant thereof. Preferred nucleic acids comprise a nucleic acid that is a single 
nucleotide polymorphism (SNP) of a nucleic acid selected from the group consisting of 
consisting of orf8, orf9, orflO, orfl 1, orfl2 ( orfl3, orfl4, orflS, orflS, orfl6, orfl7. orflS, 
orn 9> orf20, orf21, orf22, orf23, orf24, orf25, orf26, orf27, orf28, orf29, orf30, orfil, orf32, 
orB3, orD4, orB5, orf36, orf37, orf38, or£39, and orf40. 

This invention also provides an isolated gene cluster comprising open reading 
frames encoding polypeptides sufficient to direct the assembly of a bleomycin. 

In one embodiment this invention provides an isolated multi-funcUonal 
protein complex comprising both a polyketide synthase (PKS) and a polypeptide synthetase 
(NRPS) and/or an isolated nucleic acid encoding a multi-functional protein complex 
comprising both a polyketide synthase (PKS) and a polypeptide synthetase (NRPS). 

This invention also provides various blm cluster polypeptides or blm cluster- 
derived polypeptides. Thus, in one embodiment this invention provides an isolated 
polypeptide comprising a catalytic domain encoded by a nucleic acid of a bleomycin gene 
cluster wherein said nucleic acid comprises a nucleic acid selected from the group consisting 
of a nucleic acid encoding any one of Blm open reading frames (ORFs) 8 through 41; and/or 
a nucleic acid amplified by polymerase chain reaction (PCR) using any one of the primer 
pairs identified in Table II. Preferred polypeptides comprise an enzymatic domain selected 
from the group consisting of a condensation (C) domain, an adenylation (A) domain, a 
peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an acyl- 
carrier protein (ACP)-like domain, an oxidization domain (Ox), a ketoacyl synthase (KS) 
domain , an acetyl transferase (AT) domain, a ketoreductase (KR) domain, and a 
methyltransferase (MT) domain. Particularly preferred polypeptides are encoded by the 

nucleic acids described above and herein. 

This invention also provides expression vectors comprising any of the nucleic 
acids described herein and/or host cells {e.g. Streptomyces) transfected and/or transformed 
with any of these expression vectors. A preferred host cell is transformed with an exogenous 
nucleic acid comprising a gene cluster encoding polypeptides sufficient to direct the 
assembly of a bleomycin or bleomycin analog. 

This invention also provides methods of use of the blm and 6/m-derived 
nucleic acid(s) and/or polypeptides. One such method is a method of chemically modifying 
a biological molecule. The method involves contacting a biological molecule that is a 
substrate for a polypeptide encoded by one or more bleomycin biosynthesis gene cluster 
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open reading frames with Che polypeptide encoded by one or more bleomycin biosynthesis 
gene duster open readmg frames, whereby ,he polypeptide chemically modules the 
Logical molecule, In one particularly preferred embodiment, the bioiogtca, molecule ,s an 
amino acid and satd polypeptide is a peptide synthetase. In another preferred embedment, 
,he polypeptide is a methyl transferase. Outer substrates and U. encoded polypepttdes are 

illustrated in Table II. 

I„ another embodiment this invention provides a method of couphng a firs, 
amino acid to a second amino acid. Tnis method involves contacting the firs, and second 
amino acid with a recombinant expressed bleomycin nonribosomal peptide syt*etase 
(NRPS) A preferred NRPS is selected from the group consisting of NRPS-5, NRPS-4, 
NRPS-3 NRPS-9,NRPS-8,a„dNRPS-7. Another preferred NRPS is selected from the 
group consisting of NRPS-«, NRPS-2, NRPS-!, and NRPS-O. The contacting can be in wo 

(e.g. in a host cell) or ex vivo. 

I„ another embodiment this invention provides a methods of coupling a first 
fatty acid to a second fatty acid, said method comprising contacting the first and second fatty 
acids with a recombinant expressed bleomycin polyketide synthase (PKS). Again, the 
contacting can be in vivo {e.g. in a host cell) or ex vivo. 

In still another embodiment, this invention provides a method of producmg a 
neomycin or bleomycin analog. The method involves providing a cell transformed with an 
3 exogenous nucleic acid comprising a bleomycin gene cluster encoding polypeptides 

sufficient to direct the assembly of said bleomycin or bleomycin analog; cultunng the cell 
under conditions permitting the biosynthesis of bleomycin or bleomycin analog; and 
isolating said bleomycin or bleomycin analog from said cell. 

This invention also provides an isolated nucleic acid comprising a nucleic 
5 acid encoding a phosphopantetheinyl transferase said nucleic acid encoding a 

phosphopantetheinyl transferase being selected from the group consisting of: a nucleic acid 
encoding the protein encoded by the nucleic acid of SEQ ID NO:3; a nucleic acid amplified 
by polymerase chain reaction (PCR) using primers that specifically amplify ORF 41 
(primers: SEQ ID NO:71 and SEQ ID NO:72) and Streptomyces nucleic acid as a tempUte; a 
30 nucleic acid encoding a polypeptide having phosphopantetheinyl transferase activity where 
said nucleic acid specifically hybridizes to the nucleic acid of SEQ ID NO: 3 ^^ S *" 
conditions. In one embodiment, the nucleic acid comprises the nucleic acid of SEQ ID 



NO:3. 
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In another embodiment, this invention provides a polypeptide comprising a 
phosphopantetheinyl transferase encoded by SEQ ID NO;3 or a polypeptide having 
phosphopantetheinyl transferase activity and the sequence encoded by the nucleic acid of 
SEQ ID NO: 3 or conservative substitutions of that polypeptide. 
5 Also provided are vectors comprising a nucleic acid encoding a 

phosphopantetheinyl transferase {e.g.. as described above) and cells transfected with the 
vector. 

This invention also provides a method of converting an apo carrier protein to 
a holo carrier protein, said method comprising reacting said apo-carrier protein with a 
10 recombinant phosphopantetheinyl transferase encoded by SEQ ID NO:3 and coenzyme A 
thereby producing a holo-carrier protein. 

In certain embodiments, this invention specifically excludes one or more of 
open reading frames 1 through 41 . In particularly preferred embodiments, this invention 
excludes open reading frames 1 through 7 (Orf 1- Orf 7). 

15 DEFINITIONS 

The "polyketide synthases" (PKSs) refers are multifunctional enzymes, 
related to fatty acid synthases (FASs). PKSs catalyze the biosynthesis of polyketides through 
repeated (decarboxylative) Claisen condensations between acylthioesters, usually acetyl, 
propionyl, malonyl or methylmalonyl. Following each condensation, they typically 
20 introduce structural variability into the product by catalyzing all, part, or none of a reductive 
cycle comprising a ketoreduction, dehydration, and enoylreduction on the P-keto group of 
the growing polyketide chain. PKSs incorporate enormous structural diversity into their 
products, in addition to varying the condensation cycle, by controlling the overall chain 
length, choice of primer and extender units and, particularly in the case of aromatic 
25 polyketides, regiospecific cyclizations of the nascent polyketide chain. After the carbon 
chain has grown to a length characteristic of each specific product, it is typically released 
from the synthase by thiolysis or acyltransfer. Thus, PKSs consist of families of enzymes 
which work together to produce a given polyketide. Two general classes of PKSs exist. One 
class, known as Type I PKSs, is represented by the PKSs for macrolides such as 
30 erythromycin. These "complex" or "modular" PKSs include assemblies of several large 

multifunctional proteins carrying, between them, a set of separate active sites for each step of 
carbon chain assembly and modification (Cortes et al. (1990) Nature 348: 176; Donadio et 
al (1991) Science 252: 675; MacNeil et al. (1992) Gene 115:1 19). Structural diversity 
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occurs in this class from variations in the number and type of active sites in the PKSs. This 
class of PKSs displays a one-to-one correlation between the number and clustenng of active 
sites in the primary sequence of the PKS and the structure of the polyketide backbone. The 
second class of PKSs, called Type II PKSs, is represented by the synthases for aromatic 
compounds. Type II PKSs typically have a single set of iteratively used active sites (Bibb et 
al. (1989) EMBOJ. 8: 2727; Sherman et al. (\9%9( EMBO J. 8: 2717; Fernandez-Moreno, 

etal (1992) J. Biol. Chem. 267:19278). 

A "nonribosomal peptide synthase" (NRPS) refers to an enzymatic complex 
of eucaryotic or procaryotic origin, that is responsible for the synthesis of peptides by a 
nonribosomal mechanism, often known as thiotemplate synthesis (Kleinkauf and von 
Doehren (1987) Ann. Rev. Microbiol. 41 : 259-289). Such peptides, which can be up to 20 or 
more amino acids in length, can have a linear, cyclic (cyclosporin^ tyrocidine, 
mycobacilline, surfactin and others) or branched cyclic structure (polymyxin, bacitracm and 
others) and often contain amino acids not present in proteins or modified amino acids 
through methylation or epimerization. 

A "module" refers to a set of distinctive polypeptide domains that encode all 
the enzyme activities necessary for one cycle of polyketide or peptide chain elongation and 

associated modifications. 

The terms "isolated" "purified" or "biologically pure" refer to material which 
is substantially or essentially free from components which normally accompany it as found 
in its native state. With respect to nucleic acids and/or polypeptides the term can refer to 
nucleic acids or polypeptides that are no longer flanked by the sequences typically flankmg 
them in nature. 

The terms "polypeptide", "peptide" and "protein" are used interchangeably 
herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers 
in which one or more amino acid residue is an artificial chemical analogue of a 
corresponding naturally occurring amino acid, as well as to naturally occurring amino acid 
polymers. The term also includes variants on the traditional peptide linkage joming the 

amino acids making up the polypeptide. 

The terms "nucleic acid" or "oligonucleotide" or grammatical equivalents 
herein refer to at least two nucleotides covalently linked together. A nucleic acid of the 
present invention is preferably single-stranded or double stranded and will generally contam 
phosphodiester bonds, although in some cases, as outlined below, nucleic acid analogs are 
included that may have alternate backbones, comprising, for example, phosphoramide 
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(Beaucage et al. (1993) Tetrahedron 49(10):1925) and references therein; Letsinger (1970) 
J Org. Chem. 35:3800; Sprinzl et al. (1977) Eur. J. Biochem. 81: 579; Letsinger et al. (1986) 
Nucl. Acids Res. 14: 3487; Sawai et al. (1984) Chem. Let, 805, Letsinger et al (1988) J. Am. 
Chem Soc 1 10: 4470; and Pauwels et al. (1986) Chemica Scripta 26: 1419), 
phosphorothioate (Mag et a.. (1991) Nucleic Acids Res. 19:1437; and U.S. Patent No. 
5644 048), phos P horodithioate(Briu e /a/.(1989)y.^. Chem. Soc. Ill :2321,0- 
methylphophoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A 
Practical Approach, Oxford University Press), and peptide nucleic acid backbones and 
linka ges(^Egholm(199^ 

Engl 31- 1008; Nielsen (1993) Nature, 365: 566; Carlsson et al. (1996) Nature 380: 207). 
Other analog nucleic acids include those with positive backbones (Denpcy et al (1995) 
Proc Natl Acad. Sci. USA 92: 6097; non-ionic backbones (U.S. Patent Nos. 5,386,023, 

5 637 684, 5,602,240, 5,216,141 and 4,469,863; Angew. (1991) Chem. Intl. Ed. English 30: 
423- Letsinger et al. (1988) J. Am. Chem. Soc. 1 10:4470; Letsinger et al. (1994) Nucleoside 

6 Nucleotide 13:1597; Chapters 2 and 3, ASC Symposium Series 580, "Carbohydrate 
Modifications in Antisense Research", Ed. Y.S. Sanghui and P. Dan Cook; Mesmaeker et al 
(1994), Bioorganic & Medicinal Chem. Lett. 4: 395; Jeffs et al (1994)7. Biomolecular NMR 
34-17- Tetrahedron Lett. 37:743 (1996)) and non-ribose backbones, including those 
described in U.S. Patent Nos. 5,235,033 and 5,034,506, and Chapters 6 and 7, ASC 
Symposium Series 580, Carbohydrate Modifications in Antisense Research, Ed. Y.S. 
Sanghui and P. Dan Cook. Nucleic acids containing one or more carbocyclic sugars are also 
included within the definition of nucleic acids (see Jenkins et al. (1995), Chem. Soc. Re, 
p P 169-176). Several nucleic acid analogs are described in Rawls, C & E News June 2, 1997 
page 35 These modifications of the ribose-phosphate backbone may be done to facthtate the 
addition of additional moieties such as labels, or to increase the stability and half-life of such 
molecules in physiological environments. 

The term "heterologous" as it relates to nucleic acid sequences such as coding 
sequences and control sequences, denotes sequences that are not normally associated with a 
region of a recombinant construct, and/or are not normally associated with a particular cell. 
Thus, a "heterologous" region of a nucleic acid construct is an identifiable segment of 
nucleic acid within or attached to another nucleic acid molecule that is not found m 
association with the other molecule in nature. For example, a heterologous regum of a 
construct could include a coding sequence flanked by sequences not found in association 
with the coding sequence in nature. Another example of a heterologous coding sequence is a 
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cons,™, where ,he coding science W * - *- '» — ^ " 

having codons differen. from ft. native gene,. Similarly, a hos. cell transfonne w h a 

„ ♦ ;„ host cell would be considered heterologous for 

construct which is not normally present in the host cell wouia 

purposes of this invention. 

A "coding sequence" or a sequence which "encodes" a particular polypeptide 
(. , a PKS, an NRPS, is a nucleic acid sequence which is uUima,e,y ascribed and/or 
Led in,o ,ha, po,ypep,ide in Wo and/or * v,Vo when piaced under ^ con.ro, c, 
appropria,e regu.a t ory sequences. In certain embodiment ,he boundar,es of the codmg 

, codon a, the y (carboxy, terminus. A coding sequence can inCude, bu, ,s no, M « . 
i;t ft o ra procar y o,ic„reucaryo„cn^A,geno m ic D NAsequence sf ro m procaryo,,cor 

cucaryotic DNA, and even synthetic DNA sequences. In preferred embodiments, a 
transcription terminal sequence wi„ usuaiiy be iocated 3' to the codmg sequence. 

Expression "con.ro, sequences" refers collectively to promoter sequences, 
5 nbosome binding sites, poiyadenyhtion signals, «rip,ion termination sequences 
upstreat, regulatory domains, enhancers, and ft. like, which cUecUve y prov.de *r ft 
Inscription and translation of a coding sequence in a host ce,,. Not « of thes «-* 
sequences need a,wa^ be present in a recombinant vec« so ,o„ g as fte destred genets 

capable of being transcribed and translated. „ 
, 0 "Recombination" refers to the reassortment of scCtons of DNA or RNA 

sequences between twoDNA or W A mo,ecu,es. "Homo,o g ous recombinaUon" 

between two DNA molecu.es which hybridize by vWue of homologous or complement 

nucleotide sequences present in each DNA molecule. 

The terms "stringent conditions" or "hybridization under stnngen. condmons 
25 refers^condi.ionsunderwhichaprobewillhybridizepreferentially.ote.arge. 

hybridization" and "stringent hybridization wash conditions" « the context of nudetc 
hybridization experiments such as Southern and norftem hybridizations are sequence 
Lnden, and are different under different environmental parameters. An ex«ens,e g«de 
30 1 ft ^idization of nucleic acids is found inTijssen 0*3, ~rv — " » 

EltvierNewYoric.OeneraUy.highlysuingenthybridiza.ionand wash condttions are 
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at a defined ionic strength and pH. The T m is the temperature (under defined ionic strength 
and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Very 
stringent conditions are selected to be equal to the T m for a particular probe. 

An example of stringent hybridization conditions for hybridization of 
complementary nucleic acids which have more than 100 complementary residues on a filter 
in a Southern or northern blot is 50% formamide with 1 mg of heparin at 42<C, with the 
hybridization being carried out overnight. An example of highly stringent wash conditions is 
0 15 M NaCl at 72°C for about 15 minutes. An example of stringent wash conditions is a 
0 2x SSC wash at 65°C for 15 minutes {see, Sambrook et al. (1989) Molecular Cloning - A 
Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor 
Press NY, for a description of SSC buffer). Often, a high stringency wash is preceded by a 
low stringency wash to remove background probe signal. An example medium stringency 
wash for a duplex of, e.g.. more than 100 nucleotides, is Ix SSC at 45°C for 15 minutes. An 
example low stringency wash for a duplex of, e.g.. more than 100 nucleotides, is 4-6x SSC at 
40°C for 15 minutes. In general, a signal to noise ratio of 2x (or higher) than that observed 
for an unrelated probe in the particular hybridization assay indicates detection of a specific 
hybridization. Nucleic acids which do not hybridize to each other under stringent conditions 
are still substantially identical if the polypeptides which they encode are substantially 
identical. This occurs, e.g., when a copy of a nucleic acid is created using the maximum 
20 codon degeneracy permitted by the genetic code. 

A "library" or "combinatorial library" of polyketides and/or polypeptides is 
intended to mean a collection of polyketides and/or polypeptides (or other molecules) 
catalytically produced by a PKS and/or NRPS and/or hybrid PKS/NRPS (or other possible 
combination of synthetic elements) gene cluster. The library can be produced by a gene 
25 cluster that contains any combination of native, homolog or mutant genes from aromatic, 
modular or fungal PKSs and/or NRPSs. The combination of genes can be derived from a 
single PKS and/or NRPS gene cluster, e.g., act.fren, gra. tcm, whiE, gris, ery, or the like, 
and may optionally include genes encoding tailoring enzymes which are capable of 
catalyzing the further modification of a polypeptide, polyketide, or other molecule. 
Alternatively, the combination of genes can be rationally or stochastically derived from an 
assortment of NRPS and/or PKS gene clusters. The library of polyketides and/or 
polypeptides and/or other molecules thus produced can be tested or screened for biological, 
pharmacological or other activity. 
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By "random assortment" is intended any combination and/or order of genes, 
homologs or mutants which encode for the various PKS and/or NRPS enzymes, modules, 
active sites or portions thereof derived from aromatic, modular or fungal PKS and/or NRPS 

gene clusters. 

By "genetically engineered host cell" is meant a host cell where the native 
PKS and/or NRPS gene cluster has been altered or deleted using recombinant DNA 
techniques or a host cell into which a heterologous PKS and/or NRPS and/or hybrid 
PKS/NRPS gene cluster has been inserted. Thus, the term would not encompass mutational 
events occurring in nature. A "host cell" is a cell derived from a procaryotic microorganism 
or a eucaryotic cell line cultured as a unicellular entity, which can be, or has been, used as a 
recipient for recombinant vectors bearing the PKS, NRPS, and/or hybrid gene clusters of the 
invention. The term includes the progeny of the original cell which has been transfected. It 
is understood that the progeny of a single parental cell may not necessarily be completely 
identical in morphology or in genomic or total DNA complement to the original parent, due 
15 to accidental or deliberate mutation. Progeny of the parental cell which are sufficiently 
similar to the parent to be characterized by the relevant property, such as the presence of a 
nucleotide sequence encoding a desired PKS, are included in the definition, and are covered 

by the above terms. 

Expression vectors are defined herein as nucleic acid sequences that are direct 
20 the transcription of cloned copies of genes/cDNAs and/or the translation of their mRNAs in 
an appropriate host. Such vectors can be used to express genes or cDNAs in a variety of 
hosts such as bacteria, bluegreen algae, plant cells, insect cells and animal cells. Expression 
vectors include, but are not limited to, cloning vectors, modified cloning vectors, specifically 
designed plasmids or viruses. Specifically designed vectors allow the shuttling of DNA 

25 between hosts, such as bacteria-yeast or bacteria-animal cells. An appropriately constructed 
expression vector preferably contains: an origin of replication for autonomous replication m 
a host cell, a selectable marker, optionally one or more restriction enzyme sites, optionally 
one or more constitutive or inducible promoters. In preferred embodiments, an expression 
vector is a replicable DNA construct in which a DNA sequence encoding a one or more PKS 

30 and/or NRPS domains and/or modules is operably linked to suitable control sequences 

capable of effecting the expression of the products of these synthase and/or synthetases in a 
suitable host. Control sequences include a transcriptional promoter, an optional operator 
sequence to control transcription and sequences which control the termination of 
transcription and translation, and so forth. 
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A "bleomycin open reading frame", or "bleomycin ORF", or "BLM Orf ' 
refers to a nucleic acid open reading frame that encodes a polypeptide or polypeptide domain 
that has an enzymatic activity used in the biosynthesis of a bleomycin. 

A "PKS/NRPS/PKS" system refers to a synthetic system comprising an NRPS 
5 flanked by two PKSs. A "NRPS/PKS/NRPS" system refers to a synthetic system comprising 
a PKS flanked by two NRPSs. A "hybrid PKS/NRPS system" or a "hybrid NRPS/PKS 
system" refers to a hybrid synthetic system comprising at least one PKS and one NRPS 
module The system can comprise multiple modules and the order can vary. 

A "biological molecule that is a substrate for a polypeptide encoded by a 
10 bleomycin biosynthesis gene" refers to a molecule that is chemically modified by one or 
more polypeptides enccoded by open reading frame( S ) of the blm gene cluster. The 
"substrate" may be a native molecule that typically participates in the biosynthesis of a 
bleomycin, or can be any other molecule that can be similarly acted upon by the polypepUde. 

A "polymorphism" is a variation in the DNA sequence of some members of a 
15 species. A polymorphism is thus said to be "allelic," in that, due to the existence of the 
polymorphism, some members of a species may have the unmutated sequence (,, the 
original "allele") whereas other members may have a mutated sequence (i.e. the variant or 
mutant "allele"). In the simplest case, only one mutated sequence may exist, and the 
polymorphism is said to be diallelic. In the case of diallelic diploid organisms, three 
20 genotypes are possible. They can be homozygous for one allele, homozygous for the other 
allele or heterozygous. In the case of diallelic haploid organisms, they can have one allele or 
the other, thus only two genotypes are possible. The occurrence of alternative mutations can 
give rise to trialleleic, etc. polymorphisms. An allele may be referred to by the nucleotides) 

that comprise the mutation. 
25 "Single nucleotide polymorphism" or "SNPs are defined by their 

characteristic attributes. A central attribute of such a polymorphism is that it contains a 
polymorphic site, "X," most preferably occupied by a single nucleotide, which is the site of 
the polymorphism's variation (Goelet and Knapp U.S. patent application Ser. No. 
08/145,145). Method^ofidentifyingSNPsarewellknowntomoseofskillintheart^, 

30 e.g., U.S. Patent 5,952,174). 

The following abbreviations are used herein:: A, adenylation; ACP, acyl 
carrier protein; AT, acyltransferase; BLM, bleomycin; C, condensation; Cy, 
condensation/cyclization; KR, ketoreductase; KS, ketoacyl synthase; MT, methyltransferase; 
NRPS, nonribosomal peptide synthetase; orf, open reading frame; Ox, oxidation; PCP, 
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peptidyl carrier protein; PCR, polymerase chain reaction; PKS, po.yke.ide ^ f 
S,rep,o»,yces ArCP, aryi carrier protein, bp, base pair, CoA, co-enzyme A DTT, 

di.hio.hrei.oi; FAS, fatty acid synthase; kb, kilobase; PPTase, 4'- P ho S phopa„.e,hetn y l 
transferase; TCA, trichloroacetic acid; and DEBS, 6-deoxyeryfhronolide B synthase.. 

5 BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1 A and IB illustrate the biosynthetic pathway for bleomycin iniV 
(ATCC 1 5003) Figure 1 A illustrates a biosynthetic pathway for BLM in Sv ATCC1 5003- 
i„.ermedia«es except those in bracke, were identified. Figure IB shows a linear model for 
me Blm megasy„.he.ase-.empla.ed assembly of the BLM peptide/polyketide/pepttde 

,0 ag,yco„c from nine amino acids and one acetate-shaded circles represent atypical dotnams 
carrying ou, the proposed novel chemistry, and arrows with broken line indicate where 
biosynthetic intermediates were derailed. Three-letter amino acid designations were used. 

[HOI, hydroxylation; [HI, reduction. 

Figure 2 provides a restriction map and gene organization of the Mm gene 
15 cluster from Sv ATCC15003 (B, i«HI). Proposed functions for individual open readmg 
frames are summarized in Tables I and II. Modules for individua. NRPS and PKS were 
given along with their proposed substrates in parentheses. 

Figures 3 A, 3B, 3C, and 3D illustrate the determination of substrate 
S pecif,ci.yforNRPS-l andNRPS-6. Figure 3A shows a comparison of the A3 to A6reg,on 
20 of A domains to 84 NRPS modules available a, GenBank that activate various ammo acds. 
Figure 3B shows a comparison of amino acid residues that putatively line the substrate 
binding pockets for A domains (single-letter amino acid designations were used). The 
number foUowing the protein name indicates the order of a particular 
multimodular NRPS protein. The protem accession numbers are P48663 P19 
25 (AngR), AAC06346 (BacA-2), CAB03756 (MbtB), 3510629 (SyrE-7), 3114612 (AcmB-1), 
CAA67248 (SnbCl), and 3560507 (FxbC-2). Dhb stands for 2,3-dehydroaminobu^c 
acid. ., is no. known if Dhb is the direct substrate for SyrE-7 or result^ from dehydrauon of 
a„SyrE-7 ac.iva.ed Thr (Guenzi e, al. (1998)/. Biol am. 273: 32857-32863). Ftgure 3C 
Ulustta.es purified pro.eins after overexpression in E. coU as analyzed by electrophoreses . on 
30 a 10% SDS-polyacrylamide gel (the calculated molecular weighs for NRPS- 1 A and NRPS- 
6A are 64,212 and 61,899, respectively). Figure 3D illustrates substrate spec.ficttes as 
determined by the ATP-PPi exchange reaction with the amino acids of BLM as substrates 
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(100% relative activity corresponds to 103,000 cpm for NRPS-1 A and 256,000 cpm for 
NRPS-6A). 

Figure 4 illustrates a three-module NRPS/PKS/NRPS model for channeling 
the growing intermediate between NRPS and PKS modules and between PKS and NRPS 
5 modules. The KS, ACP, and C domains are shaded to emphasize their unique activit.es that 
are responsible for elongating a growing peptide with a short carboxylic acid and a growing 
polyketide with an amino acid in hybrid peptide/polyketide/peptide biosynthesis. 

Figure 5 illustrates the use of ^K/7/methyltransferase domain to introduce 
branched methyl groups in a polyketide synthesis. PCK12 has been described by Kao 
10 (1995) J Am. Chem. Sac. 7: 9105-9106. DE-1, DE-2 and DE-3 rae three representative 
products demonstrating the strategy and utility otbbnVm in introducing a CH 3 group in 

polyketide biosynthesis. 

Figure 6 illustrates the use of the blm NRPS and PKS enzymes to synthesize a 
variety of hybrid polyketide/peptide molecules including, but not limited to, a family of 
15 oxazolines/oxazoles, and thiazoline/thiazoles. 

Figure 7 illustrates the use of elements of the blm gene cluster to synthesize 

various sugars. 

Figure SAshows a restriction map of the blm gene cluster from Sv 
ATCC15003 (B, BamUl). 8B shows the relative position of the blml, blmll, and blmXI 
20 genes to the two blmAB resistance genes (W, Blm resistance). Individual open reading 
frames are represented by open arrow, Figure 8C shows the nucleotide sequence of the 
blml gene. The potential ribosome-binding site (RBS) and the conserved motif for 4'- 
phosphopantetheinylation are underlined. The sequence has been deposited into GenBank 

under accession no. • 

25 Figure 9 shows an amino acid sequence comparison of Blml with PCP 

domains of known type I NRPSs (Grs-2 [P14688], 36% identity, 58% similarity; Srfa-3 
[Q08787], 40% identity, 64% similarity; Vir-s [Y11547], 36% identity, 60% similarity; Saf- 
b [U24657], 40% identity, 54% similarity). Given in brackets are nucleotide sequence 
accession numbers. The shaded letters indicate similar amino acid, Consensus residues are 

30 amino acids that are similar in more than three sequences. The signature motif for 4 - 
phosphopantetheinylation is underlined. 

Figures 10A and 10B shows the HPLC analysis of Blml purified from E. coh 
OG7001(pBS2)(Fig. 10A), and£. colt OG7001(pBS2/pDPT-Gs P ) (Fig. 10B). 
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Figure 1 1 shows the enzyme architecture of type I and type II PKS and 
NRPS. A, adenylation domain; ACP, acyl carrier protein or ACP domain; AT, acyl 
transferase; C, condensation protein or C domain; KS, P-ketoacyl synthase domain; KSa, P - 
ketoacyl synthase a subunit; KS(3, P-ketoacyl synthase p subunit; PCP, peptidyl carrier 

protein or PCP domain. 

Figure 12 illustrates the reaction catalyzed by phosphopantetheinyl 

transferases (PPTases). 

Figure 13 shows a restriction map and gene organization of the pptA locus 

from SvATCC 15003 

DETAILED DESCRIPTION 

Polyketides and polypeptides can be assembled in a remarkably similar 
manner by repetitive addition of an extending unit to a growing chain by polyketide 
synthases (PKS) and nonribosomal peptide synthetase (NRPS) respectively. In the case of 
polyketides, the extending unit is typically a fatty acid (activated as an acyl CoA thioester) 
while the extending unit for polypeptides is typically an amino acid (activated as an 
aminonacyl adenylate). Both the PKS and NRPS systems have evolved a modular 
organization to define the number, sequence, and specificity of the incorporation of the 
extending unit and utilized the ^phosphopanththeine prosthetic group to channel the 
growing intermediate during the elongation process. 

This invention pertains to the discovery that a PKS-bound growing polyketide 
intermediate could be further elongated by an NRPS module, or conversely, a NRPS-bound 
growing polypeptide intermediate can be further elongated by a PKS module. This 
discovery permits the exploitation of NPRS, PKS, and hybrid NRPS/PKS systems to provide 
a number of novel hybrid peptide/polyketide metabolites from amino acids and short fatty 
25 acids. 

It was also a discovery of this invention that this hybrid NRPS/PKS/NRPS 
system is exemplified by the bleomycin (Blm) biosynthesis pathway in Streptomyces 
verticillus (Sv.) (ATCC 15003). The bleomycins are a family of glycopeptide-derived 
antibiotics originally isolated by Umezawa in 1996 from the fermentation broth of S. 
verticillus. Bleomycins (BLMs) exhibit strong anti-tumor activity are currently used in the 
treatment of lymphoma, particularly Hodgkin's disease, testicular tumors, squamous cell 
carcinomas of skin, head, cervix, penis, rectum, and for intracavitary therapy of malignant 
effusions in ovarian and breast cancer. The commercial product, Blenoxane®, contains 
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BLM A2 and B2 as the principle constituents. Almost uniquely among anticancer drugs, 
BLM does not cause myelosuppression, promoting its wide application in combination 
chemotherapy. 

In one aspect, this invention provides a cloned and characterized BLM gene 
5 cluster consisting of characteristic NRPS and PKS genes from the Blm producer 

Streptoveticillum sp. (ATCC 15003). The cloned and isolated Blm gene cluster provides a 
method of recombinantly expressing bleomycin and/or bleomycin analogues. Thus, m one 
embodiment, this invention provides for nucleic acids encoding bleomycin synthetic 
machinery or subunits thereof, for cells recombinantly modified to express a bleomycm 
10 and/or bleomycin analogue, and for a bleomycin or bleomycinh analogue recombinantly 

expressed in such cells. 

Like other polyketide synthase or nonribosomal peptide synthetases, the 
bleomycin synthetic pathway is organized into modules, each module catalyzing the addition 
and/or modification of one subunit (,g. fatty acid or amino acid). Each module is orgamzed 
1 5 into a number of domains each domain having a characteristic activity {e.g. activation, 
condensation, condensation/cyclization, etc.). The catalytic domains within a module and 
the modules themselves are often arranged collinearly and the order of biosynthetic modules 
from NH 2 - to COOH-terminus on each PKS and NRPS polypeptide and the number and type 
of catalytic domains within each determine the order of structural and functional elements in 
20 the resulting product. The size and complexity of the ultimately formed product are 

controlled by the number of repeated acyl chain extension steps that are, in turn, a function 
of the number and placement of carrier protein domains in these multimodular enzymes. 
The number composition and order of such domains can be altered either to introduce 
modifications, e.g. into the bleomycin to produce bleomycin analogues, or to produce 
25 different or completely new molecules. Such -recombination" is not restricted solely to 

recombination among the bleomycin catalytic domains and/or modules, but can also involve 
recombination between beomycin modules and/or subunits and other PKS and/or NRPS 
modules and/or subunit. Moreover the discovery that synthetic pathways can incorporate 
both PKS and NRPS modules and/or catalytic domains makes available hybrid PKS/NRPS 
30 syntheses. 

Thus in one embodiment this invention contemplates the use of blm gene 
cluster modules and)or catalytic domains to make various peptide and/or polyketide, and/or 
hybrid polypeptide/polyketide metabolites (including, but not limited to bleomycm 
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proteins (both type I and type II) and acyl carrier proteins (both type I and type II) into the 
holo-proteins. 

The Examples provided herein and the accompanying primers permit one of 
ordinary skill in the art to isolate the blm gene cluster of this invention, its constituent ORFs, 
various modules, or enzymatic domains. The isolated nucleic acid components can be used 
to express one or more polypeptide components for in vivo {e.g. recombinant) synthesis of 
one or more polypeptides and/or polyketides as indicated above. It will also be appreciated 
that the blm cluster polypeptides can be used for ex vivo assembly of various 
macromolecules. 

T. BLM pene cluster and the PPTase gene. 
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A) The BLM flene cluster. 

The nucleic acids comprising the blm gene cluster are identified in Tables I 
and II and listed in the sequence listing provided herein (SEQ ID NOS: 1 and 2, GenBank 
Accession numbers AT-149091, AT-210249, AF2103 1 1). In particular, Table I identifies 
genes and functions of open reading frames (ORFs) responsible for the biosynthesis of the 
hybrid peptide/polyketide/peptide backbone and sugar moieties of bleomycin, while Table II 
identifies a number of ORFs comprising the blm gene cluster, identifies the activity of the 
catalytic domain encoded by the ORF and provides primers for the amplification and 

isolation of that orf. 

As illustrated in Example 1, the blm cluster comprises a PKS module, flanked 
by several NRPS modules along with several sugar biosynthesis genes and genes encoding 
other biosynthesis enzymes as well as several resistance and regulatory genes (Table 1). 



Table I. 



Determined functions of ORFs in the bleomycin biosynthesis gene cluster 



Gene 



orfS 



blmC 



blml 



blmD 



blmE 



orflS 



blmll 



or/15 



blmll 



Amino 
acids 



424 



498 



90 



545 



390 
187 



462 



339 



935 



Sequence Homolog 



YqeR (BAA12461) 



RfaE(AA07904.1) 
GrsB (P14688) 



NodU(Q53515 



RfaF (AAD 16056) 



MbtH (005821) 



Nrp (CAA98937) 



SyrP (1890776) 



HMWP2 (P48633), McbC 
(P23185) 



Proposed function 



Oxidase 



NDP-glucose synthase 



Type II PCP 



Carbamoyl transferase 



Glycosyl transferase 



Unknown 



NRPS condens ation enzyme 
Regulation 



A PCP Qx 
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blmlV 



2626 



HMWP2 (P48633) 



C A PCP Cy A PCP Cy 



orf 18 



638 



AsnB (2293165) 



Asparagine synthetase 



blmF 



494 



RfbC (Q50864)/BlmOrf 1 
(507319) 



Glycosyl transferase/p-hydroxylase 



blmG 



325 



YtcB (2293288) 



Sugar epimerase 



blmV 



645 



McyB (2708278) 



blmVI 



2675 



ACoAS (1658531), PksD 
(S73014) 

SnbDE (CAA67249) 



PCP C 

A 4 ACP C A PCP C A 



blmVII 



1218 



SyrE (3510629) 



C A PCP 



blmVIII 



1841 



HMWP1 (CAA73127) 



KS AT MI KR ACP 



blmlX 



1066 



Saffl (1171128) 



C A PCP 



2140 



TycC (2623773) 



C A PCP C A PCP 



688 



SyrE (3510629) 



NRPS condensation enzyme 



239 



SC9C7.04C (CAA22716) 



Unknown 



wf29 



582 



or/30 



113 



YvdB (CAB08068) 



Transmembrane transporter 



SmtB (P30340) 



Regulation 



orf3} in PhnA(P16680) | — ; — : — 

F^Tn acceslion numbers are g.ven m parentheses. L Undulmcd domains contain moms mat are cany 
different from known NRPS or PKS domains. 3. This A domain lacks the typical NRPS Al. A2, A4, A8, and 
A9 motifs and more closely resembles acyl CoA synthases. ORFl to ORF7 were reported by Schmidt (1994) 
Gene 151:17-21, who assigned ORF2 as blmA and ORF4 as blmB. 

5 Noteworthy are the genes encoding the NRPS and PKS enzymes. The blml, 

blmll, and blmXI genes encode NRPSs with an unusual architecture. In contrast to all known 
NRPSs, which are of modular organization with each module consisting minimally of a 
condensation (C), an adenylate (A), and a peptidyl carrier protein (PCP) domain, Blml 
10 Blmll and BlmXI are discrete proteins homologous to individual domains of type I NRPSs. 
We have characterized Blml as a type II PCP (Du and Shen (1999) Chmn. Biol 6: 507-517). 
The Blmll and BlmXI proteins can serve as candidates for type II condensation enzymes. 

The blmlll blmlV, blmV, blmVI blmVII, blmlX, and blmX genes encode 
modular NRPSs consisting of domains characteristic for known type I NRPSs, such as the A, 
15 PCP C, and condensation/cyclization (Cy) domains, as well as an unprecedented oxidation 
(Ox) domain. BlmVI is unique among all the Blm NRPSs identified. Its N-terminal module 
(NRPS-5) consists of an atypical A domain, which bears a close resemblance to a family of 
acyl CoA synthases (Fitzmaurice and Kolattukudy (1997) 1 Bacterial 179: 2608-2615; 
Fitzmaurice and Kolattukudy (1998) J. Biol Chem. 273: 8033-8039), and an acyl carrier 
20 protein (ACP)-like domain. Its C-terminal module is truncated and presumably interacts 

with BlmV to constitute the complete NRPS-3 module (Fig. IB). Also noteworthy are the C 
domain of NRPS-3 that lacks both His residues of the conserved HHxxxDG (SEQ ID NO: 4) 
active site for transpeptidation (Stachelhaus et al (1998) J. Biol Chem. 273: 22773-22781) 

20 
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and the extra C domain at the C-terminus of BlmV. These unusual features associated with 
BlmVI and BlmV may play roles in the formation of the p-aminoalaninamide and the 
pyrimidine moieties of BLM, which are unprecedented in peptide biosynthesis. 

The blmVIII gene encodes a PKS module consisting of domains characteristic 
5 for known PKSs, such as ketoacyl synthase (KS), acyltransferase (AT), ketoreductase (KR), 
and ACP, with malonyl CoA acting as an extending unit according to sequence comparison 
of the AT domain (Haydock et al (1995) FEBS Lett. 374: 246-248) (Fig. IB). 

The identification of an integrated methyltransferase (MT) domain in the 
middle of BlmVIII is unique, representing the first PKS from actinomycetes that contains an 
10 internal MT domain. 



Table II. Blm gene cluster open reading frames (ORFs) and primers for ORF amplification. 



Orf# 


Position 


Activity 


Method 


Primers 

Forward 
Reverse 


Se 

q 

ID 
No 


orf-8 


76183- 
77457 


Oxygen-independent 
coproporphyrinogen 
III oxidase 


Gapped-blast 
comparison 1 


F: ATGAG CCACG CCATCGGA 
R: TCAGGCGCGTTCGGGGGC 


5 
6 


orf-9 


74690- 
76186 


ADP-heptose synthase 
(blmQ 


Gapped-blast 
comparison 1 


F: GTGAACACCGACCTGCCC 
R : TCATGGGGTGTCTCCCTC 


7 
8 


orf- 
10 


74421- 
74693 


Peptidyl carrier 

protein 

(blml) 


Expression and 

biochemical 

characterization. 2 


F : ATGAGCGCCCCGCGGGGC 
R: TCACCGGTCCCGCTCCCC 


9 

10 


orf- j 72787- 
1 1 i 74424 


Carbamyltransferase 
(blmD) 


Gapped-blast 
comparison 1 


F: ATGAGCGCCGACCCGTCC 
R: TCATGAGCGGGCCGCCGT 


11 
12 


orf- 
12 


71618- 
72790 


ADP-heptose:LPS 
heptosyl transferase 
(blmE) 


Gapped-blast 
comparison 1 


r: ATGACCACCCCCATGACC 
R : TCATGGGGTACTCCTGAT 


13 
14 


orf- 
13 


70983- 
71546 


Homolog of mbtH in 
the synthesis of 
mycobactin 


Gapped-blast 
comparison 1 


F: ATGACCACGACCCCGCGG 
R: TCAGGTGCCGGACACGCG 


15 
16 


orf- 
14 


69598- 
70986 


Peptide synthetase 
(condensation, blmll) 


Gapped-blast 
comparison 1 


F: GTGACCGCCCCCGGCACA 
R: TCATCGGTGGCTCCTCGT 


17 
18 


orf- 
15 


68582- 
69601 


Regulatory gene 
(homolog of syrP) 


Gapped-blast 
comparison 1 


F: GTGAACCGGCACGGCCCC 
R: TCACGCGCTCACCTCGTC 


19 
20 


orf- 
16 


65778- 
68585 


Mutated peptide 
synthetase- oxidase 
fNRFS-0, blmlll) 


Gapped-blast 
comparison 1 


F: GTGACGAGCGCCCGGCCC 
R: TCACGGGGCCTCCGTGCG 


21 

22 


orf- 
17 


57901- 
65781 


Peptide synthetase 
(NRPS-2-l f Wm/F) 


Expression and 

biochemical 

characterization. 2 


F: ATGCTGCACGGCGCCGCG 
R: TCACTCCGGTCCACCTCC 


23 
24 



21 
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; on- j 55899- Asparagine synthetase 
j 18 157815 



i Gapped-blast 
; comparison' 



F : GTGAGGCCCGTGTGCGGC I 2 5 



R: TCAGCCACCGTTGCCGCC 



26 



orf- 
19 



! 54418- 
| 55902 

i 



! Homolog of 
; hydroxylase- 
i dehydrogenase iblmF) 



Gapped-blast 
comparison 1 



F: GTGAAGGACCTCGGCCGG 
R: TCACTCCCCCGGTGCCGG 



4. i 
28 



i on- 
! 20 



! 53427- : Nucleotide-sugar 



54404 



epimerase 
! (blmG) 



i Gapped-blast 
i comparison' 



F: GTGACATGGACCGTGGTG 
R: TCAGGCATCGGCCCTCCC 



29 
30 



ort- 



i 21 



j 51493- 
! 53430 



Peptide synthetase 
(NRPS-3CT. blmV) 



Gapped-blast 
comparison 1 



F: ATGCGCGGGCATGACGAC I 31 | 
R: TCACGGTGTCTCTCCCTC I 32 I 



I orf- I 43263- 1 Peptide synthetase 
I 22 j 51290 : (NRPS-5-4-3, blmVI) 



! Expression and 

! biochemical 

i characterization. 2 



F: ATGAGCCGGCCGGCCGGC j 33 | 
R: TCATGCTCGGTCATCGCC | 34 | 



i orf- 

|23 

I 



on- 

7S 



orf- 
27 



j 39610- 
! 43266 



Peptide synthetase 
(NRPS-6, blmVII) 



Expression and 

biochemical 

characterization." 



GTGACCACGCCCCGCATC | 3 5 
TCATTCGGGACGCGGGCA I 36 



orf- j 34088- : Polyketide synthase 
24 ! 39613 ! (blmVIID 



Gapped-blast 
comparison 1 



| F: ATGAGCCATGCCGACGCG 
R: TCACAGCACCACCTCTTC 



30891- 
34091 



Peptide synthetase 
fNRPS-?! blmlX) 



\ Gapped-blast 
! comDarison 1 



F: ATGACCCCGGCCGCCGAC 
R: TCATCGTCCGCCGCCTTT 



i orf- ; 24406- Peptide synthetase 
26 I 30894 ; fNRTS-9-8. blmX) 



i Gapped-blast 



comparison 



F: 
R: 



ATGCCTCGGTGTGCCCGA 
TCATTCGGCGGCACCTCC 



22127- ; Peptide synthetase 
24193 ! ( condensation. blmXD 



Gapped-blast 
comparison' 



37 
38 



F: GTGGGTTTCCGTCGAGCG I 4 3 
R: TTACACCCTCCGTTTCTC 4 4 



39 I 

40 i 



41 
42 



orf- j 21367- ; Phosphatidylserine 
28 ! 22086 1 decarboxylase 



orf- 
29 



orf- 
30 



orf- 

31 



Gapped-blast 
comparison 1 



F: ATGGCACAGGACCTGAAC 
R : TCAACGCCACCGGATCTT 



19161- : Transmembrane 
20909 ! transponer 



Gapped-blast 
comparison 1 



F : GTGAGCTCCCTCGCCGTC 
R : TCATCGTCGGGCACTCGG 



j 18823- 
! 19164 



! Metal dependent 
• reeulatorv element 



Gapped-blast 
comparison 1 



18660- 
18307 



PHNA homolog 



Gapped-blast 



comparison 



F : GTGCCGGTTCCGCTGTAT 
R: TCACCGGGCACTGACCTC 



45 
46 



47 
48 



49 
50 



F: GTGACCGAGAACCTTCCG | 51 
R: TCAGACCTTCTTGACCAC i 52 



i orr- 



17736- Peptide synthetase | Gapped-blast 



ATGGCCTCAGACGCTTTG 



53 I 



orf- 9214- j Putative transponer 
-n 7R59 ! 


Gapped-blast 
comparison 1 


F : ATG ATG AAGTCAAGCCG C 
R: TCAGTGGCTTACAAGGAG 


55 
56 


orf- 
34 


7797- 
6784 


Homoiog of 
clavaminic acid 
synthase 


Gapped-blast 
comparison 1 


F: ATGACTGACCTGCCGTTG 
R: TCACACCAGCAGCGAGGT 


57 
58 


orf- 
35 


6773- j Thioesterase 
6021 1 


Gapped-blast 
comparison 1 


F 
R 


ATGGATTTCCCCCTCACC 
TCATGCCCCTACCTCGGC 


59 
60 


orf- 
36 


6024- Putative transponer 
4741 1 


Gapped-blast 
comparison 1 


R 


ATGACCGCGCGCGTCGAC 
TCACTCCTCGGCTTCGGC 


61 
62 


orf- 
37 


4733- Unknown 
3915 i 


Gapped-blast 
comparison 1 


F 
R 


GTGTCCAAGAACGCGGCG 
TCATCGGCTCGCCTCGTG 


63 
64 


orf- 
38 


3918- ! Peptide synthetase 
2182 1 (NRPS-12) 


Gapped-blast 
comparison' 


F 
R 


ATGACCCTCACCCTGCGG 
TCACTCGGGCACTCCTTC 


65 
66 


orf- 

39 


2185- 
1199 


Regulatory gene 
fhomoloe of SvrP 


Gapped-blast 
comparison' 


F 
R 


GTGACCGGTTCCGTAACG 
TCATGAGTCCGCCGAGGT 


67 
68 


I orf- 11015-1 1 Peptide synthetase 


Gapped-blast 


F 


ATGACAGAGGTCCGAGGT | 6 9 



22 



WO 00/40704 



PCT/US00/00445 



40 



orf- 
41 



On a 

separate 

sequence 



4'- 



phosphopantetheinyl 
transferase {pptA) 



comparison 



Expression and 

biochemical 

characterization.' 



R: CCC GGCAACCGCCCTCCC 
F: GTGATCGCCGCCCTCCTG 
R: TTACGGGACGGCGGTCCG 



70 
72 



The Blm megasynthetase comprises nine NRPS modules and one PKS 
module forming a hybrid NRPS/PKS/NRPS metasynthetase (Fig. 1 A). Inspection of the blm 
gene cluster (Fig. 2) showed that the Blm NRPS and PKS modules apparently are not 
5 organized according to the "colinearity rule" for BLM biosynthesis (Fig. 1). DetaHed 

functional organization of the megasynthetase and the BLM synthetic pathway is provided m 
Example I. 

ft ) PPTase 

This invention also provides the gene (pptA, Fig. 13) encoding 
10 phosphopantetheine transferase (PPTase) (GenBank Accession No: AF21031 1) (see, SEQID 
NO- 3) PPTase converts carrier proteins for the growing acyl chain from inactive apo-forms 
to functional holo-forms by the covalent attachment of the 4'-phosphopantetheine moiety of 
coenzyme A to a conserved serine residue of the carrier-protein substrate (see. e.g., Fig. 1 A). 

Using the sequence information provided herein (e.g. primer sequences and 
15 PPTase sequence) the PPTase nucleic acids can be routinely isolated according to standard 
methods (,g. PCR amplification). Detailed protocols for the isolation of the PPTase are 

provided in Example 3. 

Other PPTases can be identified using the probes and primers illustrated in 

Example 3. Briefly, using a primer to the THC motif (5'-C GGC ATG GTC GGC TCC HTN 
20 CANCAYTG-3',SEQIDNO:73,whereH=C+A ( N = A + C + T + G,Y-C + T ) K 

+ T R = A + G W = T + A), and a primer designed around the typical C-terminal PPTase 
motif (e g., KEA-1: 5'-T GCA GCA GAA CAG GAG GCK NYC CCA NKG - 3', SEQ ID 
NO- 74 and KEA-2: 5'- TG GGT CAG CGG GTA CCA NRC YTT RWA - 3', SEQ ID NO: 
75) and using S. verticillus chromosomal DNA as template, the set of primers THC/KEA-2 
25 a probe can be amplified (about 250 bp), that specifically binds to a PPTase. Libranes of 
organisms comprising NRPS, PKS, and/or hybrid PKS/NRPS pathways can be probed for 
the presence of a PPTase sequence. Once hybridizing clones are identified, the PPTase 
sequence can be isolated according to standard methods well know to those of skill in the art 
(see, e.g.. Example 3). 
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C ) isolatinn/preparat '"" "f nucleic acids. 

In one embodiment, this invention provides nucleic acids for the recombinant 
expression of a bleomycin. Such nucleic acids include isolated gene cluster(s) comprising 
open reading frames encoding polypeptides sufficient to direct the assembly of a bleomycin. 
5 In other embodiments of this invention, modified bleomycins (e.g. bleomycin 

analogs), novel polyketides, polypeptides, and combinations thereof (polyketide/polypeptide 
hybrids) are created by modifying PKSs and/or NRPSs so as to introduce variations into 
known polymers synthesized by the enzymes. Such variations may be introduced by design, 
for example to modify a known molecule in a specific way, e.g. by replacing a single 
10 monomeric unit within a polymer with another, thereby creating a derivative molecule of 
predicted structure. Alternatively, variations can be made randomly, for example by makmg 
a library of molecular variants of a known polymer by systematically or haphazardly 
replacing one or more modules or enzymatic domains in a known PKS or NRPS with a 
collection of alternative modules or domains. Production of alternative/modified PKSs, 
1 5 NRPSs and hybrid systems is described below. 

Using the primer and sequence information provided herein, one of ordinary 
skill in the art can routinely isolate/clone the PKS and/or NRPS modules and/or enzymatic 
domains described herein. For example, the PCR primers provided in Table II, above, can 
be used to amplify any of the orfs identified therein. Moreover, using the sequence 
20 information for the blm gene cluster provided herein, the design of other primers suitable of 
the amplification of individual ORFs, combinations of ORFs, genes, etc. is routine. 

Typically such amplifications will utilize the DNA of an organism contammg 
the requisite genes (e.g. Streptomyces verticillus) as a template. Typical amplification 
conditions include a PCR mixture consisting of 5 ng of S verticillus genomic or plasrrud 
25 DNA as template, 25 pmoles of ech primers, 25 nM dNTP, 5% DMSO, 2 units of Tag 
polymerase, 1 x buffer, with or without 20% glycerol in a final volume of 50 >xL. PCR is 
carried out (e.g. on a Gene Amp PCR System 2400 (Perkin-Elmer/ABI)) with a cycling 
scheme as follows: initial denaturing at 94°C for 5 min, 24-36 cycles of 45 sec at 94°C, 1 
min at 60°C, 2 min at 72«C, followed by additional 7 min at 72°C. One of skill will 
30 appreciate that optimization of such a protocol, e.g. to improve yield, etc. is routine (see, e.g., 
U S Patent No. 4,683,202; Innis (1990) PCR Protocols A Guide to Methods and 
Applications Academic Press Inc. San Diego, CA, etc). In addition, primer may be designed 
to introduce restriction sites and so facilitate cloning of the amplified sequence into a vector. 
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Using the information provided herein other approaches to cloning the desired 
sequences will be apparent to those of skill in the art. For example, the PKS or NRPS 
modules or enzymatic domains of interest can be obtained from an organism that expresses 
the same, using recombinant methods, such as by screening cDNA or genomic libraries, 
derived from cells expressing the gene, or by deriving the gene from a vector known to 
include the same. The gene can then be isolated and combined with other desired NRPS 
and/or PKS modules or domains, using standard techniques. If the gene in question is 
already present in a suitable expression vector, it can be combined in situ, with, e.g., other 
PKS subunits, as desired. The gene of interest can also be produced synthetically, rather 
than cloned. The nucleotide sequence can be designed with the appropriate codons for the 
particular amino acid sequence desired. In general, one will select preferred codons for the 
intended host in which the sequence will be expressed. The complete sequence can be 
assembled from overlapping oligonucleotides prepared by standard methods and assembled 
into a complete coding sequence (see. e.g.. Edge (1981) Nature 292:756; Nambair et al. 
(1984) Science 223: 1299; Jay et al. (1984) J. Biol. Chem. 259:63 1 1). In addition, it is noted 
that custom gene synthesis is commercially available (see, e.g. Operon Technologies, 
Alameda, CA). 

Examples of such techniques and instructions sufficient to direct persons of 
skill through many cloning exercises are found in Berger and Kimmel (1989) Guide to 
Molecular Cloning Techniques. Methods in Enzymology 152 Academic Press, Inc., San 
Diego, CA (Berger); Sambrook et al. (1989) Molecular Cloning - A Laboratory Manual (2nd 
ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY; Ausubel (19 
1994) Current Protocols in Molecular Biology. Current Protocols, a joint venture between 
Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., U.S. Patent 5,017,478; and 
European Patent No. 0,246,864. 

TT repression "<""™ ™™ duste™ , ""^nles. and enzymatic domains. 

As indicated above, in one embodiment this invention provides novel NRPS 
and PKS genes for the efficient recombinant production of both novel and known 
polyketides, peptides, and polyketide/polypeptide hybrids by expressing them in vivo. In 
other embodiments, such syntheses are carried out in vitro. Even in vitro syntheses, 
however, typically utilize recombinantly expressed PKSs, NRPSs, or enzymatic domains 
thereof. Thus. it is frequently desirable to express protein components of the PKSs or NRPs 
described above. 

25 
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Typically expression of the protein components of the pathway and/or of the 
products of the NRPS/PKS pathway is accomplished hy placing the subject PKS or NRPS 
nucleic acid(s) in an expression vector, and transfecting a eel, with the vector such that the 
cell expresses the desired product(s). 

5 A) Expression .vectors 

The choice of vector depends on the sequenee(s) that arc to be expressed. 
Any ttansduciHe cloning vector can be nseti as a clonin 6 vector for Ute nucleic acd 

constats of this invention. However, where large clusters are to be expressed, ,t 
phagemids, cosmids, P.s, YACs, BACs, PACs, HACs or similar cloning vectors be used for 

0 cloning the nucleotide sequences into the host ce.l. Phagemids, cosmids, and BACs for 
example, are advantageous vectors due to the ability to insert and stably propagate terem 
larger fragments of DNA than in Ml 3 phage and lambda phage, respectively. Phagemtds 
which will find use in this method generally include hybrids between plasmtds and 
filamentous phage cloning vehicles. Cosmids which will find use in this method generally 

15 include .ambda phage-based vectors into which cos sites have been inserted. Recpten, poo, 
eloning vectors can be any suitable plasmid. The cloning vectors into which pools of 
mutants are inserted may be identical or may be constructed to harbor and express afferent 
genetic markers (see, , S „ Sambrck e, a,., *pra). The utility of emp.oying such vectors 
having different marker genes may be exploited to facilitate a determination of successful 

20 transduction. . 

In preferred embodiments of this invention, vectors are used to introduce 

PKS NRPS or NRPS/PKS genes or gene clusters into host {e.g. Streptomyces) cells. 
Numerous vectors for use in particular host cells are well known to those of skill m the art. 
ForexampledescribedinMa^^^ 
25 (1994), Science, 265: 509-512; and Hopwood et ai, (1987) Methods Enzymol., 153: 1 16-166 
all describe vectors for use in various Streptomyces hosts. 

In a preferred embodiment, Streptomyces vectors are used that include 
sequences that allow their introduction and maintenance in £. co//. Streptomyces* 
c0 , shuttle vectors have been described (..for example, Vara etal.. «™**»^ 
30 171-5872-5881; Guilfoile & Hutchinson (1991) Proc. Natl. Acad. Sci. USA. 88: 8553-8557.) 

The gene sequences, or fragments thereof, which collectively encode a PKS 
and/or NRPS and/or PKS/NRPS gene cluster, one or more ORFs, one or more modules, or 
one or more enzymatic domains of this invention, can be inserted into one or more 

26 
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expression vectors, using methods known to those of skill in the art. Expression vectors will 
include control sequences operably linked to the desired NRPS and/or PKS coding sequence 
or fragment thereof. Suitable expression systems for use with the present invents mclude 
systems that function in eucaryotic and prokaryotic host cells. However, as explained above, 
prokaryotic systems are preferred, and in particular, systems compatible with Streptomyces 
spp are of particular interest. Control elements for use in such systems include promoters, 
optionally containing operator sequences, and ribosome binding sites. Particularly useful 
promoters include control sequences derived from PKS and/or NRPS gene clusters, such as 
one or more act promoters. However, other bacterial promoters, such as those derived from 
sugar metabolizing enzymes, such as galactose, lactose {lac) and maltose, will also find use 
in the present constructs. Additional examples include promoter sequences derived from 
biosynthetic enzymes such as tryptophan (trp), the beta -lactamase (bla) promoter system, 
bacteriophage lambda PL, and T5. In addition, synthetic promoters, such as the tac promoter 
(U.S. Patent 4,55 1,433), which do not occur in nature also function in bacterial host cells. In 
15 Streptomyces, numerous promoters have been described including constitutive promoters, 
such as ermE and tcmG (Shen and Hutchinson, (1994) J. Biol. Chem. 269: 30726-30733), as 
well as controllable promoters such as actl and ^///(Pleper et al, (1995) Nature, 378: 263- 
266; Pieper et al, (1995) J. Am. Chem. Soc, 1 17: 1 1373-1 1374; and Wiesmann et al. (1995) 

Chem. & Biol. 2: 583-589). 
20 Other regulatory sequences may also be desirable which allow for regulation 

of expression of the PKS replacement sequences relative to the growth of the host cell. 
Regulatory sequences are known to those of skill in the art, and examples include those 
which cause the expression of a gene to be turned on or off in response to a chemical or 
physical stimulus, including the presence of a regulatory compound. Other types of 

25 regulatory elements may also be present in the vector, for example, enhancer sequences. 

Selectable markers can also be included in the recombinant expression 
vectors. A variety of markers are known which are useful in selecting for transformed cell 
lines and generally comprise a gene whose expression confers a selectable phenotype on 
transformed cells when the cells are grown in an appropriate selective medium. Such 

30 markers include, for example, genes that confer antibiotic resistance or sensitivity to the 
plasmid. Alternatively, several polyketides are naturally colored and this characteristic 
provides a built-in marker for selecting cells successfully transformed by the present 
constructs. 
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The various PKS and/or NRPS clusters or subunits of interest can be cloned 
into one or more recombinant vectors as individual cassettes, with separate control elements, 
or under the control of, e.g., a single promoter. The PKS and/or NRPS subunits can include 
flanking restriction sites to allow for the easy deletion and insertion of other PKS subunits so 
that hybrid PKSs can be generated. The design of such unique restriction sites is known to 
those of skill in the art and can be accomplished using the techniques described above, such 

as site-directed mutagenesis and PCR. 

Methods of cloning and expressing large nucleic acids such as gene clusters, 
including PKS- or NRPS-encoding gene clusters, in cells including Streptomyces are well 
known to those of skill in the art (see, e.g., Stutzman-Engwall and Hutchinson (1989) Proc. 
Natl. Acad. Sci. USA, 86: 3135-3139; Motamedi and Hutchinson (1987) Proc. Natl. Acad. 
Sci. USA, 84: 4445-4449; Grim et al. (1994) Gene, 151: 1-10; Kao et al. (1994) Science, 
265:509-512;andHopwoode/a/.(1987)M^.£«rymo/., 153: 116-166). In some 
examples, nucleic acid sequences of well over lOOkb have been introduced into cells, 
including prokaryotic cells, using vector-based methods (see, for example, Osoegawa et al., 
(1998) Genomics, 52: 1-8; Woon et al., (1998) Genomics, 50: 306-316; Huang et al., (1996) 
Nucl. Acids Res., 24: 4202-4209). In addition, the cloning and overexpression of NRPS- 1 
and NRPS-6 is illustrated in Example 1. 

In certain embodiments this invention may make use of genetically 
engineered cells that either lack PKS and/or NRPS genes or have their naturally occurring 
PKS and/or NRPS genes substantially deleted. These host cells can be transformed with 
recombinant vectors, encoding a variety of PKS and/or NRPS gene clusters, for the 
production of active polyketides. The invention provides for the production of significant 
quantities of product, e.g. a bleomycin, at an appropriate stage of the growth cycle. The 
BLMs or other hybrid polyketide/peptide metabolites so produced can be used as therapeutic 
agents, to treat a number of disorders, depending on the type of metabolites in question. For 
example, several of the polyketides and peptides produced by the present method will find 
use as immunosuppressants, as anti-tumor agents, as well as for the treatment of viral, 
bacterial and parasitic infections. The ability to recombinantly produce polyketides and 
peptides also provides a powerful tool for characterizing PKSs and/or NRPSs and the 
mechanism of their actions. 
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R ) Host cells. 

The vectors described above can be used to express various protein 
components of the polyketide and/or polypeptide synthetic modules for subsequent isolation 
and/or to provide a biological synthesis of one or more desired biomolecules (e.g 
5 polyketides, peptides, etc.). Where one or more proteins of the blm cluster are expressed 
(e g. overexposed) for subsequent isolation and/or characterization, the proteins are 
expressed in any prokaryotic or eukaryotic cell suitable for protein expression. In one 
preferred embodiment, the proteins are expressed in E. coli. Overexpression of Wm/m E 

coli is described in Example 2. 
10 Host cells for the recombinant production of the subject polyketides can be 

derived from any organism with the capability of harboring a recombinant PKS, NRPS or 
PKS/NRPS gene cluster. Thus, the host cells of the present invention can be derived from 
either prokaryotic or eucaryotic organisms. However, preferred host cells are those 
constructed from the actinomycetes, a class of mycelial bacteria which are abundant 
15 producers of a number of polyketides and peptides. A particularly preferred genus for use 
with the present system is Streptomyces. Thus, for example, S. verticillus S. ambofaciens, S. 
avermitilis, S. azureus, S. cinnamonensis, S. coelicolor, S. curacoi, S. erythraeus, S.fradtae, 
S galilaeus, S. glaucescens, S. hygroscopicus. S. lividans, S. parvulus, S. peucetius, S. 
rimosus S. roseofulvus, S. thermotolerans, S. violaceoruber, among others, will prov.de 
20 convenient host cells for the subject invention, with S. coelicolor being preferred (see, e.g., 
Hopwood, D. A. and Sherman. D. H. Ann. Re, Genet. (1990) 24:37-66; O'Hagan, D. The 
Polyketide Metabolites (Ellis Horwood Limited, 1991), for a description of various 
polyketide-producing organisms and their natural products.) 

In a preferred embodiment, the above-described cells are genetically 
25 engineered by deleting one or more naturally occurring PKS and/or NRPS genes therefrom, 
using standard techniques, such as by homologous recombination, (see, e.g., Khosla, et al. 

(1992) Molec. Microbiol. 6: 3237). 

In certain embodiments, a eukaryotic host cell is preferred (e.g. where certain 
glycosylation patterns are desired). Suitable eukaryotic host cells are well known to those of 
30 skill in the art. Such eukaryotic cells include, but are not limited to yeast cells, insect cells, 
plant cells, fungal cells, and various mammalian cells (e.g. COS, CHO HeLa cells lines and 
various myeloma cell lines) 
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n Prfttrin/polvk* »'Hft recovery, 

Tu recovery is accomplished according to standard 

Polypeptide and/or polyketide recovery is a v 

. *. * tv,,ic for example where Ww cluster 

1 ,o express various biomo.ecuies (,*. polypes, sugars, polypepfdes, 

polyketide synthase. Biochemistry 37. 2084-2088, ueu 

Koto, Mfc O-ib » »** P " rlflaUi ° n - M ' DeUBCher ' ' 

5 in -■--«" recr ---"' "Mmvclns. 

^ ^^nent this invention provides methods of syn,hes,»ng 

accomplished by providing an orgamsm( e .g. a bactena c , 

•0 11 rtJ ,v o ■ „ o^- ^0Q-S12 have cloned tne 

r Kao e/ al (1994) Science, 265. ^uvo 1 a iw 

(e a, 5. lividans or 5. coelicolor). Kao * ; 

rrTh^iononhe^cinster. .is^oa — t^eror 
25 Ue—era— 

homologous double recombination event m E. coh {Id.). P 

chloramphenicol resistant (CM ) such as pus. R 
„ c^itive recipient by co-transforming E.coli with the Cm 
donor into the temperature-sensitive reap en y ^ ^ ^ 

30 recipient P lasmidandtheapramycinresist a nt(Ap ) P KC505 d , 

,/ Le cluster followed by chloramphenicol and apramycin selection at 30 C. 
M. gene cluste r f oil o y ^ ^ ^ chloramphenicol and 

harboring both plasmids (Cm , Ap ) w recombinati on event 

apramycin plates and only those cointegrates formed by a single 
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between the two plasmids are viable. Surviving colonies are then propagated at 30<C on 
Cm R plates to select for recombinant plasmids formed by the resolution of comtegrates 
through a second recombinant event. The desired blm cluster is cloned into the Cm 
temperature-sensitive plasmid and is ready to be moved into any expression plasnnd by a 
similar means of homologous recombinant event. 

For example, if pWHM861 is the choice of shuttle plasmid for the expression 
of the blm cluster in S. livilans (Meurer and Hutchinson (1995) J. Bacterial, 177: 477-481), 
the two ends spanning the blm cluster downstream of the ErmE* promoter in the ampicillm 
resistant (AM R ) plasmid P WHM861 are cloned. The resulting plasmid is co-transformed 
with the temperature-sensitive plasmid containing the blm cluster described above into E. 
coli under the selection of chloramphenicol and ampicillin at 30°C. These Cm R and AM 
colonies are shifted to 44°C on chloramphenicol and ampicillin plates to undergo a smgle 
recombination event and the surviving colonies are resolved on ampicillin plates at 30°C by 
completing the double recombination process. The resulting plasmid is suitable for 
transformation into S. Imdans by selection of thiostrepton, in which the expression of the 
desired blm cluster is under the control of the ErmE* promoter. The S. Imdans 
transformants are cultured and any metabolites produced are isolated and character^. 

Once production of BLM in S. lividans is established, mutated alleles of the 
blm synthetase can be introduced into the blm cluster for the production of BLM analogs. 

TV Altered cndo2«"™iis cxore? """ bleomycins. 

Using the Blm gene cluster information provided herein, one of skill in the art 
may regulating the synthesis of endogenous bleomycin. The expression of various ORFs 
comprising the blm gene cluster may be increased or decreased to alter bleomycin synthes* 
levels. 

Methods of altering the expression of endogenous genes are well known to 
those of skill in the art. Typically such methods involve altering or replacing all or a portion 
of the regulatory sequences controlling expression of the particular gene that is to be 
regulated. In a preferred embodiment, the regulatory sequences (e.g., the native promoter) 
upstream of one or more of the blm ORFs are altered. 

This is typically accomplished by the use of homologous recombination to 
introduce a heterologous nucleic acid into the native regulatory sequences. To downregulate 
expression of one or more blm ORFs, simple mutations that either alter the reading frame or 
disrupt the promoter are suitable. To upregulate expression of the blm ORF(s) the native 
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promoted) can be substituted with heterologous promotcr(s) that induce higher than normal 

levels of transcription. 

In a particularly preferred embodiment, nucleic acid sequences comprising the 
structural gene in question or upstream sequences are utilized for targeting heterologous 

5 recombination constructs. 

The use of homologous recombination to alter expression of endogenous 

genes is described in detail in U.S. Patent 5,272,071, WO 91/09955, WO 93/09222, WO 

96/2941 1, WO 95/31560, and WO 91/12650. 

V synthesis Q fBI M analogs. 

10 in one one embodiment, this invention provides methods of synthesizing 

modified bleomycins or bleomycin analogs. In preferred embodiments, the BLM analogs are 
synthesized either by introducing specific perturbations into individual NRPS and/or PKS 
enzymatic domains or modules, or by reprogramrning the linear order in which the NRPS or 
PKS enzymatic domains and/or modules appear in the blm synthetase genes. The former 

15 will lead to BLM analogs with targeted modifications at the BLM backbone and the latter 
will allow incorporation of other extension units in variable sequence into the biosynthesis of 
BLM In particularly preferred embodiments, the genetically modified blm synthetases are 
produced in S. verticilus, however, it will be recognized that the entire blm gene cluster can 
be cloned into other hosts, e.g. into S. lividans or S. coelicolor. 

20 In preferred embodiments modification of the blm gene cluster to yield BLM 

analogues is accomplished by one of two different approaches. In one approach, the BLM 
enzymatic domains and/or modules modules are altered in a directed manner {i.e. they are 
changed in a preselected way), while in another approach, random/haphazard alterations are 
introduced into the blm cluster and the resulting products are screened to identify those with 

25 desired properties. 

M gynthPsU of BL M »n,inq S hv specific engineering of the blm synthetase 
genes. 

The blm synthetase genes can be re-engineered by means of specific 
mutations or by reprogramrning the linear order of the NRPS or PKS enzymatic domains or 
30 modules. In this approach, a wild-type blm synthetase allele is replaced with these mutants 
in and expressed in an appropriate host (e.g.. S. verticillus or in a heterologous host). Since 
both NRPSs (Stachelhaus et al. (1995) Science, 269: 69-72) and PKSs (Donadio et al. (1993) 



32 



10 



WO 00/40704 PCT/USOO/00445 
Proc Natl. Acad. Sci. USA, 90: 71 19-7123, Donadio et al. (1995) J. Am., CHem. Soc. 1 17: 
9105-9106, Cortes et al. (1995) Science, 268: 1487-1489) have shown considerable tolerance 
to reprogramming, it is expected that these modifications of the BLM synthetase will result 
in the production of BLM analogs with predicted structural alterations. For example, 
targeted modification at the (2S,3S,4R)-4-amino-3-h y drox y -2-methyl/pentanoic acid AHM 
moiety of BLM can be accomplished by introduction of mutations into the BLMVJIIPKS 
module of the BLM synthetase locus. Inactivation of the MT or KR motif by in-frame 
deletion or site-directed mutagenesis will result in the production of BLM analogs containing 
a demethyl-AHM, oxo-AHM, or oxo-demethyl-AHM moiety, etc. 

Alternatively, individual functional NRPS domains and/or the PKS module 
can be deleted or the PKS module can be duplicated in-frame to produce BLM analogs with 
shorter or longer backbone, respectively. Alternatively, or in addition, the NRPS domains or 
the PKS module can be rearranged for the production of BLM analogs with a completely 
different backbone. The NRPS and PKS features can be combined into one integrated 
15 system, providing access to a structural variation not available by either the NRPS or PKS 
system alone. 

To create such mutations, plasmids are constructed carrying in-frame 
deletions of DNA segments encompassing a portion of the blm synthetase activities. 
Construction of specific deletions is preferably accomplished by one of the following two 
20 strategies. The first involves subcloning of a DNA fragment in a gene replacement vector, 
selection of two restriction sites suitably located at the two ends of the DNA segments, and 
deletion of this segment from within the plasmid by rejoining the two resulting ends. An m- 
frame deletion can be obtained by a suitable combination of Klenow filling and SI treatment 

of both ends prior to ligation. 

The second approach involves polymerase chain reaction (PCR) amplificat»on 
of two DNA segments that separate the region to be deleted followed by joining of the two 
fragments in the correct orientation in a gene replacement vector. This can be accomplished 
by designing PCR primers with suitable restriction sites. The restriction site used to generate 
the deletion and the sequences to serve as templates for the PCR amplification are chosen so 
30 as to generate two segments of blm synthetase DNA of approximately equal length m the 
construction in order to maximize the chance of gene replacement. The gene replacement 
vector containing the allelic or deletion mutation is introduced into a Streptomyces stram 
(eg S verticillus). Integration of the plasmid into the S. verticillus chromosome v.a a 
single reciprocal homologous recombination will yield a recombinant that will be isolated by 
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selection for the vector mar.ce, The resulting integrants are .her. grown under nonselective 
conditions and further resolution by se.ection for ,he loss of the vector marker v,a ,he second 
homologous recombination even, will produce the desired deletion mutants. 

Southern analysis of the isola.ed deletion mutants with the target DNA rs 
performed to ensure that the expected double crossover recombination even, has taken place. 
The firs, approach is convenient if mere are suitably spaced restriction sites in the DNA 
seance. The second approach enable me deletion of any DNA segment but may be 
.iled by the size of .he DNA segments ma. can be amplified by PCR. These 
recombinants are cultured under typical conditions for BLM production and the fermentation 
broth is screened for me production of any novel BLM analogs resulted from the specfic 
mutations in the blm synthetase locus. 

m Synthesis c " ' " — *v ™«™" rt " n <* Mm ^ M 
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Bleomycin analogs can also be synthesized by randomly/haphazardly altenng 
genes in the BLM cluster expressing the products of the randomly modified megasynthetase 
L then screening the products for the desired actiyity. Methods of "randomly" altenng b lm 
cluster genes are described below. 

VI r.encrf """ "'her systems. 

to addition to me production of bleomycin or modified bleomycins, .be Mm 
gene Custer or elements thereof can be used by themselves or in combination w,th NRPS 
and/or PKS modules and/or enzymatic domains of other PKS and/or NRPS systems to 
produce a wide variety of compounds including, but no, limited to various polyketides, 
polypeptides, po.ykc.ide/po.ypcp.ide hybrids, various oxazoles and miazoles, vanous sugars, 
various methylated polypeptides/polyketides, and the like. As witir me production of 
modified bleomycins described above, such compounds can be produced, ft, vivo or « 
by catalytic biosynthesis using large, modular PKSs, NRPSs, and hybrid PKS/NRPS 
systems. The megasyn.he.ases directing such synmeses can be rationally designed e,. by 
predetermined action/modification of polykctidc and/or polypeptide and/or hybnd 
PKS/NRPS pathway, Alternatively, large combinational libraries of colls harboring vanous 
megasyn.he.ases can be produced by .he random modification of particular pathways and 
,he„ selected for the production of a molecule or molecules of interest. Uw„, be appreciated 
that, in certain embodiments, such libraries of megasynthetaseVmodified pativways, can be 
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used to generate large, complex combinatorial Hbraries of compounds which themselves can 
be screened for a desired activity. 

A) Direct^ mndificati "" nf hinmolecules. 

Elements (e.g. open reading frames) of the blm biosynthetic gene cluster 
5 and/or variants thereof can be used in a wide variety of "directed" biosynthetic processes (i.e. 
where the process is designed to modify and/or synthesize one or more particular preselected 
metabolites)). Polypepitdes encoded by particular open reading frames or combinations of 
open reading frames can be utilized to perform particular chemical modifications of 
biological molecules. 

10 Thus, for example, open reading frames encoding a polypeptide synetase can 

be used to chemically modify an amino acid by coupling it to another amino acid. In another 
example, the methyl transferase in BlmVIIIcn be utilized to introduce methyl groups into 
polyketides, and other, substrates. The glycosyl transferases can be used to glycosylate 
appropriate substrates, and so forth. These examples, are merely illustrative. One of skill in 

15 the art, utilizing the information provided here, can perform literally countless chemical 
modifications and/or syntheses using either "native" bleomycin biosynthesis metabolites as 
the substrate molecule, or other molecules capable of acting as substrates for the particular 
enzymes in question. Other substrates can be identified by routine screening. Methods of 
screening enzymes for specific activity against particular substrates are well known to those 

20 of skill in the art. 

The biosyntheses can be performed in vivo, e.g. by providing a host cell 
comprising the desired blm gene cluster open reading frame(s) and/or in vivo, e.g., by 
providing the polypeptides encoded by the blm gene cluster ORFs and the appropriate 
substrates and/or cofactors. 

25 R ) Directe d Pi oneering nf novel synthetic pathways, 

In numerous embodiments of this invention, novel polyketides, polypeptides, 
and combinations thereof are created by modifying known PKSs or NRPSs so as to introduce 
variations into known polymers synthesized by the enzymes. Such variations may be 
introduced by design, for example to modify a known molecule in a specific way, e.g. by 

30 replacing a single monomeric unit within a polymer with another, thereby creating a 

derivative molecule of predicted structure. Such variations can also be made by adding one 
or more modules to a known PKS or NRPS, or by removing one or more module from a 



35 



10 



WO 00/40704 PCT/USOO/00445 
known PKS or NRPS. Such novel PKSs or NRPSs can readily be made using a variety of 
techniques, including recombinant methods and in vitro synthetic methods. 

Using any of these methods, it is possible to introduce PKS domains into a 
NRPS or vice versa, thereby creating novel molecules including both peptide and polyketide 
structural domains. For example, a PKS enzyme producing a known polyketide can be 
modified so as to include an additional module that adds a peptide moiety into the 
polyketide. Novel molecules synthesized using these methods can be screened, usmg 
standard methods, for any activity of interest, such as antibiotic activity, effects on the cell 

cycle effects on the cytoskeleton, etc. 

Novel polyketides, polypeptides, or combinations thereof can also be made by 
creating novel PKSs or NRPSs de novo, using recombinant or in vitro synthetic methods. 
Such novel arrangements of domains can be designed, i.e. to create a specific polymer. In 
addition to creating novel PKSs or NRPSs by combining modules, the methods of this 
invention can also be used to make novel modules that can add new monomeric units to a 
15 growing polypeptide or polyketide chain. Because the identity of each module, and, 
consequently, the identity of the monomer added by the module, is determined by the 
identity and number of the functional domains comprising the module, it is possible to 
produce novel monomeric units by creating novel combinations of functional domains with.n 
a module. Such novel modules can be created by design, for example to make a specific 
module that will add a specific monomer to a polyketide or polypeptide, or can be created by 
the random association of domains so as to produce libraries of novel modules. Such novel 
modules can be made using recombinant or in vitro synthetic means. 

Mutations can be made to the native NRPS and/or PKS subunit sequences and 
such mutants used in place of the native sequence, so long as the mutants are able to function 
25 with other PKS and/or PKS subunits to collectively catalyze the synthesis of an identifiable 
polyketide and/or polypeptide. Such mutations can be made to the native sequences using 
conventional techniques such as by preparing synthetic oligonucleotides including the 
mutations and inserting the mutated sequence into the gene encoding a NRPS and/or PKS 
subunit using restriction endonuclease digestion, (see. e.g., Kunkel, (1985) Proc. Natl. Acad. 
30 Sci USA 82: 448; Geisselsoder et al. (1987) BioTechniques 5: 786). Alternatively, the 

mutations can be effected using a mismatched primer (generally 10-20 nucleotides in length) 
which hybridizes to the native nucleotide sequence, at a temperature below the melting 
temperature of the mismatched duplex. The primer can be made specific by keeping primer 
length and base composition within relatively narrow limits and by keeping the mutant base 
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fusing DNA polymerase, * product Coned and Cones — * 
oltivly segregation of the primer extended strand, — • 

shed using the mutant primer as a hybridation probe. The technique . also 
accomplished using tnemuia F n a ibie-McFarland « <i/. (1982) 

r:=rz^— si—— 

desired mutations. 

f MiH.r «f PKS/NBPS pa.lm.ys, 

In another embodiment, variations can be made randomly, for example by 
Y . a library of molecular variants of a known polymer by randomly mutating one or 

::m?du,esore^cdoma ta in. k no™PKSor NR PSwiU,acollec,,onof 

_e,,ereby — ^ 
thP.e methods These combinations can be made using stanoaru 

=========== 

DNA shuffling methods described m Cramen et al, (1998) Nature 

DNA snuii g In add i t ion novel combinations can be made in vitro, 

«s within restriction endonuCease sites, inserting an oligonuc.eot, = 1 n^ndo 

. nxT a cvnthpqis bv error-prone PCRmutagenesib, ujtf 
nucleotides during in vitro DNA synthesis, by err p rhemical mutagens 

include, for example, sodium bisu.fi., nitrous acid, hydroxylase, agents £ h d 

rogues of nucleotide precursors such as nitrosoguanidine, 5-bromourac. U2-»» 
or acr dine interesting agents such as proflavine, acriflavine, o.u.»acn„e, and Wee. 
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Generally, plasmid DNA or DNA fragments are treated with chemicals, transformed into E. 
coli and propagated as a pool or library of mutant plasmids. 

Large populations of random enzyme variants can be constructed in vivo 
using "recombination-enhanced mutagenesis." This method employs two or more pools of, 
for example, 10 6 mutants each of the wild-type encoding nucleotide sequence that are 
generated using any convenient mutagenesis technique, described more fully above, and then 
inserted into cloning vectors. 

p) T „^ T ^otinn a nd/or r™H.fi, fl fion of non-hlm cluster elements. 

In either the directed or random approaches, nucleic acids encoding novel . 
combinations of modules and/or enzymatic are introduced into a cell. In one embodiment, 
nucleic acids encoding one or more PKS or NRPS domains are introduced into a cell so as to 
replace one or more domains of an endogenous PKS or NRPS within a chromosome of the 
cell Endogenous gene replacement can be accomplished using standard methods, such as 
homologous recombination. Nucleic acids encoding an entire PKS, NRPS, or combination 
thereof can also be introduced into a cell so as to enable the cell to produce the novel 
enzyme, and, consequently, synthesize the novel polymer. In a preferred embodiment, such 
nucleic acids are introduced into the cell optionally along with a number of additional genes, 
together called a 'gene cluster/ that influence the expression of the genes, survwal of the 
expressing cells, etc. In a particularly preferred embodiment, such cells do not have any 
other PKS- or NRPS- encoding genes or gene clusters, thereby allowing the straightforward 
isolation of the polymer synthesized by the genes introduced into the cell. 

Furthermore, the recombinant vector(s) can include genes from a single PKS 
and/or NRPS gene cluster, or may comprise hybrid replacement PKS gene clusters with, e. g , 
a gene for one cluster replaced by the corresponding gene from another gene cluster. For 
example, it has been found that ACPs are readily interchangeable among different synthases 
without an effect on product structure. Furthermore, a given KR can recognize and reduce 
polyketide chains of different chain lengths. Accordingly, these genes are freely 
interchangeable in the constructs described herein. Thus, the replacement clusters of the 
present invention can be derived from any combination of PKS and/or NRPS gene sets that 
ultimately fiinction to produce an identifiable polyketide and/or peptide. 

Examples of hybrid replacement clusters include, but are not limited to, 
clusters with genes derived from two or more of the act gene cluster, the W hiE gene cluster, 
frenolicin ifren), granaticin (gra), tetracenomycin item), 6-methylsalicylic acid (6-msas), 
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oxytetracycline (otc), tetracycline (tet), erythromycin (ery), griseusin (gris), nanaomycin, 
medermycin, daunorubicin, tylosin, carbomycin, spiramycin, avermectin, monensin, 
nonactin, curamycin, rifamycin and candicidin synthase gene clusters, among others. (For a 
discussion of various PKSs, see, e.g., Hopwood and Sherman (1990) Ann. Rev. Genet. 24: 
5 37-66- O'Hagan (1991) The Polyketide Metabolites, Ellis Horwood Limited. 

A number of hybrid gene clusters have been constructed, having components 
derived from the act. fin. torn, gris and gra gene clusters {see, e.g., U.S. Patent 5,712,146). 
Other hybrid gene clusters, as described above, can easily be produced and screened usmg 
the disclosure herein, for the production of identifiable polyketides, polypeptides or 
10 polyketide/polypeptide hybrids. 

Host cells {e.g. Streptomyces) can be transformed with one or more vectors, 
collectively encoding a functional PKS/NRPS set {e.g. a bleomycin or bleomycin analog), or 
a cocktail comprising a random assortment of PKS and/or NRPS genes, modules, active 
sites, or portions thereof. The vectors) can include native or hybrid combinations of PKS 
15 and/or NRPS subunits or cocktail components, or mutants thereof. As explained above, the 
gene cluster need not correspond to the complete native gene cluster but need only encode 
the necessary PKS and/or NRPS components to catalyze the production of the deseed 
product. For example, in Streptomyces aromatic PKSs, carbon chain assembly requires the 
products of three open reading frames (ORFs). ORF1 encodes a ketosynthase (KS) and an 
20 acyltransferase (AT) active site (KS/AT); ORF2 encodes a chain length determming factor 
(CLF), a protein similar to the ORF1 product but lacking the KS and AT motifs; and ORF3 
encodes a discrete acyl carrier protein (ACP). Some gene clusters also code for a 
ketoreductase (KR) and a cyclase, involved in cyclization of the nascent polyketide 
backbone. However, it has been found that only the KS/AT, CLF, and ACP, need be present 
25 in order to produce an identifiable polyketide. Thus, in the case of aromatic PKSs denved 
from Streptomyces, these three genes, without the other components of the native clusters, 
can be included in one or more recombinant vectors, to constitute a "minimal" replacement 
PKS gene cluster. 

v.) Variation of starter an d extender units. 
30 In addition to varying the PKS and/or NRPS modules and/or domains, 

variations in the products produced by various PKS/NRPS systems can be obtained by 
varying the starter units and/or the extender units. Thus, for example, a considerable degree 
of variability exists for starter units, e.g., acetyl CoA, maloamyl CoA, propionyl CoA, 
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acetate, bu.yrate, isobu.yrate and the like. In addition, natural* occurring PKSs and/or 
NRPSs have shown some tolerance for varying extender units. 

■7 ) Fvamnler "f pri-ferred modifications. 

As indicated above, me novel PKS and NRPS modules and enzymatic 
5 domains identified herein can be used to perform specific single modifications of particular 
substrates, or as components of complex synthetic pathways to generate pamcular products 
or large contbinatoria. l.braries. As described in the Examples, a number of modules of the 
Wrag =„=c,us,erprovidenove lfa nc.io„ali W . Byway of example, a few preferred reactions 
are .is.ed beiow. These examples are intended to be il.ustrative and at* no, cxhausttve nor 
10 limiting. 

, ,i„ „f RlmVTTTVKS > " i-tmHnw branched methyl grout 
The bin, rm gene identified herein encodes a PKS module consisting of 
domains characteristic for known PKSs, such as ketoacy. synthase (KS), acyltransferase 
(AT), ke.oreduc.ase (KR), and ACP, with malonyl CoA acting as an extending untt. 
However, me identification of an integrated mefty.transfe.se (MT) domain in the m.dd.e of 
Wm vm is unique, representing the firs, PKS from ac.i„omyc=.es .ha. contains an mternai 
MT domain. The use of mis meutytaansferase domain allows the introduction of a branched 
methyl group during a po.yke.ide and/or polypeptide and/or hybndmg 
polyketide/polypeptide synthesis. F.gure 5 illus.ra.es the use oMPKS » —ng 
20 a polyketide biosynthesis that introduces a branched methyl group. 

The firs, formula in Figure 5 illustrates a polyketide synthests metaled by 6- 
deoxye^ono.ideBsyn«has.(DEBS)wh i ch„orma.. y ca U . y zes.hebiosynmesisof*« 

erythromycinaglycone,«-deox y e^^ 

of ,he Will meutyluansfer.se (MT) group a, differen, points in me synthesis results m 
introduction of a methyl group a, differen, locations in the resulting product. 

In view of this illustration, one of skill in the art would appreaate that the 
itaFSr MT domain can be used in a wide variety of biosyntheses «o introduce methyl 
branches. 
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7 Use of * » Mm gene clu "" ^ke thia/olidine, thiazoline, 
,h ig ™ie. hi-thiaz "'^™ hithiazoline und hifhiazole-containing 

compounds. 

The BlmlVmt Blmlll NRPSs are characterized by unusual Cy domains as 
5 well an unprecedented Ox domain, providing an efficient biosynthesis for a bithiazole 

structure. While thiazoline is the direct product of the Cy domain, the thiazoline-to-thiazole 
conversion generally is performed with an additional oxidation step. We identified at the C- 
terminus of NRPS-0 an additional domain that shows low, but significant, sequence 
homology to a family of putative oxidases/dehydrogenases, including the McbC protein of 
10 the microcin B17 synthase (Table 1). Microcin B17 synthase catalyzes the synthesis of the 
oxazole and thiazole-containing peptide antibiotic microcin B17, and McbC has been 
proposed to play a role in catalyzing the oxazoline/thiazoline-to-oxazole/thiazole converse. 
Consequently, we propose that this extra domain at the C-terminus of NRPS-0 provides the 
oxidase/dehydrogenase activity for the biosynthesis of the bithiazole moiety of BLM, 

1 5 defining a novel Ox domain for NRPSs. 

It is noteworthy that a cell-free preparation from Sv ATCC15003 has been 
reported to catalyze the conversion of phleomycins to BLMs in the presence of NAD + , 
supporting the hypothesis that the bithiazole moiety of BLM results from stepwise 
oxidations of a bithiazoline precursor (Fig. 1 A). (The phleomycin producer could be 
20 imagined to result from the loss of its Ox activity for the first thiazoline ring.) Given the 
wide distribution of thiazole or oxazole rings in natural products exhibiting an onpicsnve 
array of biological activities, the cloning of the blmlV, HI genes and the identification of the 
Ox domain open many opportunities thiazole biosynthesis and to synthesize novel ffaazole 
containing molecules by engineering peptide biosynthesis. 
25 Representative thiazole syntheses using variants of the blm NRPS are 

illustrated in Figure 6. Note that in Figure 6, A M and A* refer to an A domain that activates 
and amino acid with R M and R N groups, respectively. A c refers to an A domain that 
activates Cys (x = SH) or Ser (X = OH) that can be cyclized to form the oxiaoline/ttnazoline 
or oxazole/thiazole structures. DH is a dehydratase. In view of these representee 
30 examples, one of skill in the art would appreciate that the blm NRPS domain and its vanants 
can be used in a wide variety of chemical syntheses make thiazolidine, thiazoline, thiazole, 
bi-thiazolidine, bithiazoline, or bithiazole-containing compounds. 
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i Use of the b'~> ? ™ d»ster to heterocyclic ring-containing 
compounds. 

Various blm modules can be used to produce heterocyclic ring-containing 
compounds. Such heterocycles include, but are not limited to five member S- and N- 
containg compounds of the thiazolidine, thiazoline and thiazole family or the O- and N- 
containing compounds of the oxazolidine, oxazoline, and oxazole family. Again, the 
preparation of such compounds is illustrated in Figure 6. 

a TIsp of the blm gene cluster fn make sugars. 

In still another embodiment, the blm gene cluster or elements thereof can be 
used to make sugars. Such sugars include, but are not limited to L-sugars (with the BlmG 
epimerase), sugars modified by a carbamoyl group {e.g., using BlmD), and various 
disaccharides. Representative examples of such syntheses are illustrated in Figure 7. Such 
sugar biosynthesis genes can also e used to attach sugars onto other polyketide and/or 
peptide aglycones. 

F) Screening of products. 

Particularly where large combinatorial libraries are synthesized, e.g. using one 
or more modules and/or enzymatic domains of the blm gene cluster it will often be desired to 
screen the resulting compound(s) for the desired activity. Mehtods of screening compounds 
{e.g. polypeptides, polyketides, sugars, thiazoles, etc.) for various activities of interest {e.g. 
cytotoxicity, antimicrobial activity, particular chemical activities, etc.) are well known to 

those of skill in the art. 

Where large numbers of compounds are produced, it is often desired to 
rapidly screen such compounds using "high throughput systems" (HTS). High throughput 
assays systems are well known to those of skill in the art and many such systems are 
commercially available, {see, e.g., Zymark Corp., Hopkinton, MA; Air Technical Industries, 
Mentor, OH; Beckman Instruments, Inc. Fullerton, CA; Precision Systems, Inc., Natick, 
MA, etc.). These systems typically automate entire procedures including all sample and 
reagent pipetting, liquid dispensing, timed incubations, and final readings of the microplate 
in detector(s) appropriate for the assay. These configurable systems provide high 
throughputand rapid start up as well as a high degree of flexibility and customization. The 
manufacturers of such systems typically provide detailed protocols for the various high 
throughput screens. 
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VII. In Vitro syntheses. 

In additional embodiments of this invention, bleomycins and other 
polyketides and/or polypeptides are synthesized and/or modified in vitro. Individual 
enzymatic domains or modules can be used in vitro to modify a unit and/or to add a single 
monomeric unit to a growing polyketide or polypeptide chain. In one approach a 
metasynthetase providing all the desired synthetic activities recombinant^ expressed and 
then provided, the appropriate substrates and buffer system e.g. in a bioreactor, to direct the 
synthesis of the desired product. In another approach, various PKSs and/or NRPSs are 
provided in different solutions and the growing polymer chains can be sequentially 
introduced into the plurality of solutions, each containing a single (or several) PKS or NRPS 
modules. In still another embodiment, the PKS and/or NRPS modules or enzymatic domains 
are provided attached to a solid support and a fluid contgaining the growing macromolecule 
is passed over the surface whereby the PKSs or NRPSs are able to react with the target 
substrate. 

In one preferred embodiment, a combinatorial library of polyketides or 
polypeptides, or combinations thereof, is created by using automated means to facilitate the 
sequential introduction of a multitude of polymeric chains, each attached to a solid support, 
to a collection of solutions, each containing a single PKS orNRPS module. These 
automated means can be used to systematically vary the sequence by which each polymeric 
chain is introduced into the various solutions, thereby creating a combinatorial library. 
Numerous methods are well known in the art to create combinatorial libraries of molecules 
by the sequential addition of monomeric units, for example as described in WO 97/02358. 

VIII. Kits. 

In still another embodiment, this invention provides kits for practice of the 
methods described herein. In one preferred embodiment, the kits comprise one or more 
containers containing nucleic acids encoding one or more of the blm gene cluster ORFs 
and/or one or more of the BLM PKS or NRPS modules or enzymatic domains. Certain kits 
may comprise vectors encoding the blm orfs and/or cells containing such vectors. The kits 
may optionally include any reagents and/or apparatus to facilitate practice of the assays 
described herein. Such reagents include, but are not limited to buffers, labels, labeled 

antibodies, bioreactors, cells, etc. 

In addition, the kits may include instructional materials containing directions 
(i e protocols) for the practice of the methods of this invention. Preferred instructional 
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materia, prov^e protocol u,«n g the kit contents for creating or ^^^iT 
ORF -or for synthesizing or modifying a moiecule using one or more tin, modules a d/or 

Xnatic — » *"* - ma,ena ' S ,yPiCa " y COmPrfSe r" d 

mlais they are no, limited to such. Any medium capable of storing such mstructrons and 

TI L ing them to an end user is contemplated by this invention. Such med.a mclnde 
Ze no, hrld to electronic storage media (e.g., magnetic discs, tapes, cadges, chrps), 
o:,i;iledia(e,.,CO R OM,,andmeli k e.Snchmedi.mayincludeaddresses,om.ern« 

sites that provide such instructional materials. 

EXAMPLES 

The following examples are offered ,o illustrate, bu, no, ,0 limi, me claimed 

invention. 

Example 1 

„i^^^ 

r . r iMp and nol T 1 ""' 1 ' Mnsvnthesis. 
Here we report the cloning and characterization of the Mm biosynthesis gene 
Custer from * ATCC15003 (Fig. 2). Sequence analysis and biochemical characterizauon o 

to constitute the Blm megasynthetase complex (Fig. IB). These studres reve 
unprecedented feamres for peptide and polyWde b,osyn,hesis, ^*-»« 

and supported the wisdom of combining individual NRPSandPKSmodu.es for 
lol—biosynthesistoma.c nove— nahrra, products from am.no acds and 

short carboxylic acids. 

TYFfltnriqic and Methods. 

25 nonpral procedures. 

Escherichia colt DH5« (Sambrook et al. (1989) Molecular Cloning: A 
Uoora.ory Manual 2nd ed, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 

Main WO, and Sv ATCC. 5003 (American Type Cutare Collectton, Rockvl.e, MD) 
Madrson, W X ^ ^ Peorla , 

30 were used m tins work. pOJ«0( g , d ofl , er plasmids 

IL), pQE60 (Qiagen, Santa Clarita, CA), pET28a and pET29a (Novage 
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were from commercial sources. E. coli (Sambrook, supra.) and Sv ATCC15003 strains 
(Hopwood et al. (1985) Genetic Manipulation ofStreptomyces: A Laboratory Manual, The 
John Innes Foundation, Norwich, UK) were cultured under standard conditions. 

Plasmid preparation was carried out by using commercial kits (Qiagen). Total 

5 Sv ATCC15003 DNA was isolated according to literature protocols (Hopwood et al. (1985) 
Genetic Manipulation ofStreptomyces: A Laboratory Manual, The John Innes Foundation, 
Norwich, UK; Nagaraja et al. (1987) Methods Enzymol. 153: 166-198). Restriction enzymes 
and other molecular biology reagents were from commercial sources, and digestions and 
ligation followed standard methods (Sambrook, supra.). For Southern analysis, digoxigemn 

0 labelling of DNA probes, hybridization, and detection were performed according to the 
protocols provided by the manufacturer (Boehringer Mannheim Biochemicals, Indianapolis, 
IN). 

Automated DNA sequencing was carried out on an ABI Prism 377 DNA 
Sequencer (Perkin-Elmer/ABI, Foster City, CA), and this service was provided by either the 
15 DBS Automated DNA Sequencing Facility, UC Davis, or Davis Sequencing (Davis, CA). 
Data were analyzed by the ABI Prism Sequencing 2.1.1 software and the Genetics Computer 
Group (GCG) program (Madison, WI). 

rinnin p and sequencing of t ^ Mm pene cluster. 

A genomic library of Sv ATCC15003 was constructed in pOJ446 according to 

20 literature procedures (Nagaraja et al. (1987) Methods Enzymol. 153: 166-198) and screened 
with probes made from both ends of the blmAB locus (Sugiyama et al. (1994) Gene 151:11- 
16- Calcutt and Schmidt (1994) Gene 151: 17-21), leading to the localization of 140-kb 
contiguous DNA, of which 100-kb is upstream (Fig. 2) and 40-kb is downstream (data not 
shown) of the blmAB genes. Heterologous NRPS probes were amplified from Sv 

25 ATCC15003 by polymerase chain reaction (PCR) according to literature procedures (Turgay 
and Marahiel (1994) Peptide Res. 7: 238-241) and used to screen the entire 140-kb DNA by 
Southern analysis under various hybridization conditions (Shen et al. (1999) Bioorg. Chem. 
27: 155-171). 

Prediction of substrate sp p"firitv of NRPSs. 
30 The nine Blm NRPS modules were compared with eighty four modules from 

various bacterial and fungal NRPSs available at the GenBank, including those with known or 
putative specificity for amino acids present in BLM. A table of overall similarities/identities 
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was generated by PILEUP analysis of the A3 to A6 regions, and the residues lining the 
substrate binding pocket by comparison with PheA (Conti et al. (1997) EMBO J. 16, 4174- 
4183) were determined by PILEUP/PRETTY analysis. The percentage similarities for each 
Blm NRPS module were plotted against the rest of the NRPS modules to display the overall 
sequence homology between the A3 to A6 region. Those modules that showed significantly 
higher homology were selected to compare the amino acid residues that line the substrate 
binding pocket. 

nv^ prnri.Ttinn and bio^pmical chara ^rization of the NRPS-1A and NRPS- 
6A proteins. 

Heterologous expression of the A domain in E. coli were performed according 
to literature procedures (Mootz and Marahiel (1997) J. Bacteriol. 179: 6843-6850). NRPS- 
1A (forward primer 5'-AAC CCA TGG CTG CTT CCC TGA CCC GCC TGG CC-3 \ SEQ 
ID NO:76, and reverse primer 5'-CCT AGA TCT ACG GGC AGG TGG GGC GGT-3', 
SEQ ID NO:77) and NRPS-6A (forward primer 5'-GGG AAT TCC ATA TGA TCC TCA 
CGT CCT TCC AC-3\ SEQ ID NO:78, and reverse primer 5'-GGC AAG CTT GGG TGA 
GGG TCC GTT CGG T-3\ SEQ ID NO:79) were amplified by PCR from Sv ATCC 15003 
cosmid clones. The resulting 1.6-kb fragment of NRPS-1A was first cloned into the 
NcoVBgOl sites of pQE60 and then moved as an NcoVHindlll fragment into the similar sites 
of pET29a to yield pBSlO, and the resulting 1.6-kb fragment of NRPS-6A was directly 
cloned into the NdeVHindlll sites of pET28a to yield pBSl 1. Introduction of pBSlO and 
pBSl 1 into £ colt BL21(DE-3) under standard expression conditions resulted in production 
of NRPS-1 A (with an N-terminal S-tag and a C-terminal His 6 -tag) and NRPS-6A (with an N- 
terminal His 6 -tag), respectively. The soluble fractions effusion proteins were subjected 
sequentially to an affinity chromatography on Ni-NTA resin and an anion exchange 
chromatography on a Hyper-D column (PerSeptive Biosystem, Framingham, MA), resulting 
in NRPS-1 A and NRPS-6A with near homogeneity. 



Results and Discussion. 



rioning of the blm pene cluster from Sv ATCC15003. 

Davies and co-workers previously cloned two BLM resistance genes (blmA 
and blmB) from Sv ATCC15003 (Sugiyama et al. (1994) Gene 151: 11-16), and Calcutt and 
Schmidt (1994) Gene, 151: 17-21, sequenced a 7.2-kb DNA fragment flanking the blmAB 
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genes, revealing seven open reading frames (orfs), none of which were found to encode Blm 
NRPS or PKS enzymes. Given the precedent that antibiotic production genes commonly 
occur as a cluster in actinomycetes, we adopted an approach combining chromosomal 
walking from the blmAB resistance locus and DNA hybridization with heterologous NRPS 
probes to clone and identify the blm cluster, leading to the localization of 140-kb contiguous 
Sv ATCC15003 DNA. DNA sequencing of approximately 90-kb of the blm gene cluster, 
including the 7.2-kb blmAB locus, revealed 40 ORFs (Fig. 2). Preliminary functional 
assignments were made by comparison of the deduced gene products with proteins of known 
functions in the database. Among the ORFs identified from the blm cluster, we indeed found 
a PKS module, flanked by several NRPS modules-a fact that supports the hybrid 
NRPS/PKS/NRPS hypothesis for BLM biosynthesis-along with several sugar biosynthesis 
genes and genes encoding other biosynthesis enzymes as well as several resistance and 

regulatory genes (Table 1). 

Noteworthy are the genes encoding the putative NRPS and PKS enzymes. 
The blml blmll, and blmXI genes encode NRPSs with an unusual architecture. In contrast to 
all known NRPSs, which are of modular organization with each module consisting 
minimally of a condensation (C), an adenylation (A), and a peptidyl carrier protein (PCP) 
domain (1), Blml, Blmll, and BlmXI are discrete proteins homologous to individual domains 
of type I NRPSs. We have characterized Blml as a type II PCP (18). The Blmll and BlmXI 
proteins could serve as candidates for type II condensation enzymes. It is unclear yet what 
role if any these discrete NRPS enzymes could play in BLM biosynthesis. 

The blmlll, blmlV. blmV, blmVI, blmVII, blmlX, and blmX genes encode 
modular NRPSs consisting of domains characteristic for known type I NRPSs (A special 
thematic issue on polyketide and nonribosomal polypeptide biosynthesis, (1997) Chem. Rev. 
97: 2463-2706), such as the A, PCP, C, and condensation/cyclization (Cy) domains (Konz et 
al (1997) Chem. Biol 4: 927-937), as well as an unprecedented oxidation (Ox) domain (see 
discussion below). However, BlmVI is unique among all the Blm NRPSs identified. Its N- 
terminal module (NRPS-5) consists of an atypical A domain, which bears a close 
resemblance to a family of acyl CoA synthases (Fitzmaurice and Kolattukudy (1997) J. 
Bacterial. 179: 2608-2615; Fitzmaurice and Kolattukudy (1998) J. Biol. Chem. 273: 8033- 
8039), and an acyl carrier protein (ACP)-like domain (A special thematic issue on polyketide 
and nonribosomal polypeptide biosynthesis, (1997) Chem. Rev. 97: 2463-2706). Its C- 
terminal module is truncated and presumably interacts with BlmV to constitute the complete 
NRPS-3 module (Fig. IB). Also noteworthy are the C domain of NRPS-3 that lacks both 
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His residues of the conserved HHxxxDG (SEQ ID NO:4) active site for transpeptidation 
(Stachelhaus et al. (1998) J. Biol. Chem., 273: 22773-22781) and the extra C domain at the 
C-terminus of BlmV. These unusual features associated with 5/mK/and may play 
roles in the formation of the P -aminoalaninamide and the pyrimidine moieties of BLM, 
which are unprecedented in peptide biosynthesis. For example, we propose that the NRPS- 
4-activated Ser is first dehydrated into dehydroalanine before condensation-an analogous 
Thr-to-2 3-dehydroaminobutyric acid dehydration has been observed in syringomycin 
biosynthesis (Guenzi et al. (1998) J. Biol. Chem. 273: 32857-32863). Conjugate addition to 
dehydroalanine by Asn on the NRPS-3 module downstream followed by an aminolysis to 
cleave the Ser-Asn adduct off the Blm megasynthetase furnishes the p-aminoalaninamide 
moiety (Fig. IB). The former reaction could be catalyzed by the C domain of NRPS-3 that 
apparently is nonfunctional for normal transpeptidation due to the lack of the active sites, 
and thelatter reaction could be catalyzed by the acyl CoA synthase-like domain of NRPS-5 
in a process that resembles the acyl CoA synthase-catalyzed synthesis of acyl CoA from 
carboxylic acid (Stachelhaus et al. (1998) J. Biol. Chem. 273: 22773-22781; Guenzi et al. 
(1998) J. Biol. Chem. 273: 32857-32863) but in the reverse direction in the presence of an 

amino donor (Fig. IB). 

The blmVIII gene encodes a PKS module consisting of domains characteristic 

for known PKSs, such as ketoacyl synthase (KS), acyltransferase (AT), ketoreductase (KR), 
and ACP, with malonyl CoA acting as an extending unit according to sequence comparison 
of the AT domain (Haydock et al. (1995) FEBSLett. 374: 246-248) (Fig. IB). However, the 
identification of an integrated methyltransferase (MT) domain (Kagan and Clarke (1994) 
Arch. Biochem. Biophys. 310: 417-427) in the middle of BlmVIII is unique, representing the 
first PKS from actinomycetes that contains an internal MT domain. The only other example 
of PKS from bacteria that contains an internal MT domain is HMWP1 of the yersiniabactin 
gene cluster (Pelludat et al. (1998) J. Bacteriol. 180: 538-546). It has been assumed that 
fungal PKSs in general contain internal MTs for the introduction of methyl branch into the 
polyketide products, as it has been shown recently in lovastatin biosynthesis (Kennedy et al. 
(1999) Science 284: 1368-1372). 

Thp Rim megasynthetase-ten iplated assembly of BLM. 

According to the hybrid NRPS/PKS/NRPS model for BLM biosynthesis (Fig. 
1A), we predict a linear modular organization of individual NRPS and PKS modules to 
constitute the Blm megasynthetase. Thus, the first functional domain of the Blm 
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syringomycin synthetase gene cluster (Guenzi et al. (1998) J. Biol. Chem. 273: 32857- 
32863), and in fact, Grandi and co-workers have demonstrated recently in Bacillus subtilis 
that neither the operon-type structure nor the physical linkage of individual modules is 
essential for proper assembly and activity of the surfactin NRPS megasynthetase (Guenzi et 
al. (1998) J. Biol. Chem. 273: 14403-14410).] Realizing that the BLM biosynthesis cannot 
be rationalized according to the "colinearity rule", we determined the substrate specificity of 
individual NRPS and PKS modules in an attempt to shed light on the modular organization 
of the Blm megasynthetase complex. Brick and co-workers postulated, based on the X-ray 
structural analysis of the A domain of GrsA, PheA, that the region between core sequences 
A3 to A6 represent the amino acid specificity determinant of an NRPS module (Conti et al. 
(1997) EMBO J. 16: 4174-4183). Since the A domains in all known NRPSs share a 
significant sequence identity (ensuring that the main chain conformation of the enzymes is 
likely to be very similar), they further proposed that the differing substrate specificity of 
individual NRPS modules will be mainly determined by the nature of the amino acids lining 
the substrate binding pocket (Stachelhaus et al. (1999) Chem. Biol. 6: 493-505; Conti et al. 
(1997) EMBO J. 16: 4174-4183). Given this structural information and the vast amount of 
NRPS sequences available at the GenBank, we developed a novel approach for predicting 
substrate specificity for NRPS modules by comparing the overall sequence between the A3 
to A6 region and the eight amino acid residues that line up the substrate binding pocket. 
While a constant level of similarities (30%-40%) was evident among all the NRPS modules 
analyzed, most of the Blm NRPS modules showed striking similarities (50%-60%) to a 
particular cluster of NRPS modules as exemplified in Fig. 3A for NRPS-1 and NRPS-6. 
Close examination of these modules clustered with higher similarities revealed that they 
activate the same or very similar amino acid, based on which the putative substrate for the 
NRPS in query could be predicted, i.e., NRPS-1 and NRPS-6A activate L-Cys and L-Thr, 
respectively. These predictions were further supported by comparing the residues lining the 
substrate binding pocket. For example, the amino acid residues lining the substrate binding 
pocket for NRPS- 1 and NRPS-6 are almost identical to those NRPS modules that are known 
to activate L-Cys and L-Thr, respectively, as shown in Fig. 3B. To verify the predicted 
amino acid specificity, we overproduced and purified the NRPS-1 A and NRPS-6A proteins 
(Fig. 3C) and examined their substrate specificity according to the amino acid-dependent 
ATP-PPi assay (Lee et al. (1970 Meth. Enzymol, 43: 585-602; Ku et al. (1997) Chem. & 
Biol., 4: 203-207). NRPS-1 A and NRPS-6A indeed activate specifically L-Cys and L-Thr, 
respectively, among the amino acids tested (Fig. 3D). The latter results greatly enhanced our 
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confidence in predicting the substrate specificity of a NRPS module by the above method. 
Wc subsequently determined the substrate specificity for all the NRPS modules identified 
from the blm gene cluster and they in fact accounted for all nine amino acids required for 

BLM biosynthesis (Fig. 2). 

Using the substrate specificity of individual NRPS and PKS modules as a 
guide, we can align the nine NRPS and one PKS modules to constitute the Blm 
megasynthetase as shown in Fig. IB according to our hybrid NRPS/PKS/NRPS model for 
BLM biosynthesis (Fig. 1 A). Among all the PKSs or NRPS systems examined so far, the 
Blm megasynthetase consists of the largest number of individual proteins. The precise 
interactions among all the Blm NRPS and Blm PKS proteins to constitute the Blm 
megasynthetase complex, therefore, reflect a remarkable power of protein-protein 
recognition (Guenzi et al. (1998) J. Biol. Chem. 273: 14403-14410; Gokhale et al. (1999) 
Science 284: 482-485). Although we are yet to provide direct evidence supporting the 
specific protein-protein interactions between the neighboring proteins, it is striking to note 
that all the biosynthetic intermediates isolated are derailed from either PKS or NRPS 
modules at the junctions between the interacting proteins (Fig. IB). Since it is not difficult 
to imagine that an intermediate is more likely to fall off the enzyme complex when it is 
subjected to interpeptide transfer than to intrapeptide transfer, we view the latter observation 
as strong evidence supporting the current model of the Blm megasynthetase 

RlmTX/RlmVIII/BlmVTT as a hybrid N KPS/PKS/NRPS model. \ 

Recent biosynthetic studies on rapamycin in Streptomyces hygroscopicus 
(Konig et al. (1997) Eur. J. Biochem. 247: 526-534), yersiniabactin in Yersinia 
enterocolitica and Y. pestis (Pelludat et al. (1998) J. Bacterial. 180: 538-546; Gehring et al. 
(1998) Chem. Biol. 5: 573-586; Gehring et al. (1998) Biochemistry 37: 11637-1 1650) and 
TA in Myxococcus xanthus (Paitan et al. (1999) J. Mol. Biol. 286, 465-474) are starting to 
shed light on hybrid peptide and polyketide biosynthesis. Two models are emerging for the 
alignment between a NRPS and a PKS module. The interacting NRPS and PKS modules 
could be either covalently linked by arranging all domains in a linear order on the same 
protein (Pelludat et al. (1998) J. Bacterial. 180: 538-546; Gehring et al. (1998) Chem. Biol. 
5: 573-586; Gehring et al. (1998) Biochemistry 37: 1 1637-1 1650; Paitan et al. (1999) J. Mol. 
Biol. 286: 465-474) or physically located on two separate proteins, requiring specific protein- 
protein recognition to ensure the correct pairing between the interacting modules (Pelludat et 
al. (1998) J. Bacteriol. 180: 538-546; Konig et al. (1997) Eur. J. Biochem. 247: 526-534; 
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Gehring et al. (1998) Chem. Biol. 5: 573-586; Gehring et al. (1998) Biochemistry 37: 11637- 
1 1650). Common to all these systems, however, are the unusual features associated with the 
interacting modules, such as the lack of the AT domain of the PKS module in Tal (Paitan et 
al. (1999) J. Mol. Biol. 286: 465-474) and the lack of the A domain and the presence of the 
Cy domain of the NRPS modules in both HMWP 1 and HMWP2 (Pelludat et al. (1998) J. 
Bacteriol. 180: 538-5461; Gehring et al. (1998) Chem. Biol. 5: 573-586; Gehring et al. 
(1998) Biochemistry 37: 1 1637-1 1650). While extremely intriguing, the latter features 
complicate mechanistic analysis of these systems, making them less ideal candidates for 
studying how NRPS and PKS integrate into a productive hybrid NRPS/PKS complex. 

The BlmlXIBlmVIII/BlmVII system combines the features of both hybrid 
NRPS/PKS and PKS/NRPS systems, serving as an ideal model for studying hybrid peptide 
and polykctide biosynthesis. The fact that both the BlmlX and Blm VII NRPS modules and 
the BlmVIII PKS module themselves are three separate proteins with a typical domain 
organization for NRPS and PKS enzymes greatly simplifies the mechanistic analysis of the 
hybrid NRPS/PKS/NRPS complex. We have found that the KS domain of BlmVIII is more 
similar to the KSs of HMWP1 (Pelludat et al. (1998) J. Bacteriol. 180: 538-546) and Tal 
(Paitan et al. (1999) J. Mol. Biol. 286: 465-474), both of which catalyze the elongation of a 
peptidyl intermediate with a malonate, than to KSs of type I PKSs. We attribute these subtle 
differences to their unique reactivity that catalyzes the transfer of the peptidyl intermediate 
from the PCP to the KS domain, which presumably takes place prior to chain elongation 
(Fig.4). Subsequent condensation catalyzed by the KS domain between the peptidyl 
intermediate and malonyl-S-ACP results in the elongation of the growing peptide with a 
carboxylic acid. Equally striking are the discoveries that the ACP domain of BlmVIII is 
more similar to a PCP than to an ACP and that the C domain of BlmVII has an additional N- 
terminal segment of about 50 amino acids that is rich in arginine, aspartic acid, and glutamic 
acid. The latter feature is analogous to the N-terminal intcrpolypeptide linker for type I PKS, 
which has recently been demonstrated to play a critical role in intermodular communication 
(Gokhale et al. (1999) Science 284: 482-485). We propose that these unique features of the 
ACP domain from the BlmVIII PKS module and the C domain.from the BlmVII NRPS 
module provide the molecular basis for the C domain to recognize the acyl-S-ACP as a 
substrate. Subsequent condensation catalyzed by the C domain between acyl-S-ACP and 
amino acyl-S-PCP results in the elongation of the growing polyketide (as far as this 
condensation is concerned) with an amino acid (Fig. 4). 
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Nnv ft l domains for th* Blm NRPS a nd PKS modules. 

Various NRPS and PKS domains have been characterized, which are the 
building blocks for the entire field of combinatorial biosynthesis. The success for 
combinatorial biosynthesis depends critically upon the repertoire of these individual 
domains. Genetic analysis of the blm gene cluster has uncovered several novel NRPS and 
PKS domains. Without being bound to a particular theory, it is believed that Blm VI and 
5/mKare involved in the biosynthesis of the (}-aminoalaninamide and pyrimidine moieties of 
BLM). In addition, the MT domain in BlmVIII, the Cy domains in BlmlV, and the Ox 
domain in Blmlll are novel domains. 

The BlmVIII PKS module apparently furnishes the "propionate" unit into 
BLM in two steps by evolving a malonyl CoA-specifying AT domain coupled with a novel 
S-adenosylmethionine-requiring MT domain, representing a new mechanism to introduce 
methyl branches into polyketides (Fig. 4). This biosynthetic reaction sequence is 
unprecedented for polyketide biosynthesis since all PKSs from actinomycetes examined to 
date incorporate the alkyl branches into the resultant polyketides by selecting various alkyl 
malonates as the extending units that are determined by the AT domains. Yet, feeding 
experiments have unambiguously established that the polyketide moiety of BLM was 
derived from an acetate and a methionine (Takita and Muroka (1990) pages 289-309 in 
Biochemistry of Peptide Antibiotics: Recent Advances in the Biotechnology ofp-Lactams and 
Microbial Peptides, Kleinkauf, H. & von DShren, H. eds., W. de Gruyter, N.Y.), a fact that 
fits well with the observed unusual domain organization of the BlmVIII PKS module (Fig. 
4). It is conceivable that the combination of this MT domain with an AT domain specific for 
a methyl malonate extending unit (Haydock el al. (1995) FEBS Lett. 374: 246-248) could 
result in the synthesis of polyketides with a gem-dimethyl moiety via engineering polyketide 
biosynthesis. Such a gem-dimethyl group has been found to be a very important 
pharmacophore for the epothilones, a family of hybrid peptide and polyketide metabolites 
that exhibits a remarkable antitumor activity similar to taxol (Ojima et alo. (1999) Proc. 

Natl. Acad. Sci. USA 96: 4256-4261). 

The BlmlV and Blmlll NRPSs are characterized by the unusual Cy domains 
as well as the unprecedented Ox domain, providing an efficient biosynthesis for a bithiazole 
structure. The Cy domain was first defined by Marahiel and co-workers in their study of 
bacitracin biosynthesis in B. licheniformis (Konz et al. (1997) Chem. Biol. 4: 927-937), and 
the Cy activity was demonstrated recently by Walsh and co-workers in their study of the 
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HMWP1 and HMWP2 proteins for yersiniabactin biosynthesis in Y. pestis (Gehring et al. 
(1998) Chem. Biol. 5: 573-586; Gehring et al. (1998) Biochemistry 37: 1 1637-1 1650). 
While thiazoline is the direct product of the Cy domain, the thiazoline-to-thiazole conversion 
requires an additional oxidation step. We identified at the C-terminus of NRPS-0 an 
5 additional domain that shows low, but significant, sequence homology to a family of putative 
oxidases/dehydrogenases, including the McbC protein of the microcin B17 synthase (Table 
1). Microcin B 17 synthase catalyzes the synthesis of the oxazole and thiazole-containing 
peptide antibiotic microcin B17, and McbC has been proposed to play a role in catalyzing the 
oxazoline/thiazoline-to-oxazole/thiazole conversion (Li et al. (1996) Science 274: 1188- 

10 1 193; Milne, et al. (1999) Biochemistry 38: 4768-4781). Consequently, we propose that this 
extra'domain at the C-terminus of NRPS-0 could provide the oxidase/dehydrogenase activity 
needed for the biosynthesis of the bithiazole moiety of BLM, defining a novel Ox domain for 
NRPSs. It is noteworthy that a cell-free preparation from Sv ATCC15003 has been reported 
to catalyze the conversion of phleomycins to BLMs in the presence of NAD + (Takita and 

1 5 Muroka ( 1 990) pages 289-309 in Biochemistry of Peptide Antibiotics: Recent Advances in 
the Biotechnology ofp-Lactams and Microbial Peptides, Kleinkauf, H. & von Dohren, H. 
eds., W. de Gruyter, N.Y.), supporting the hypothesis that the bithiazole moiety of BLM 
results from stepwise oxidations of a bithiazoline precursor (Fig. 1 A). (The phleomycin 
producer could be imagined to result from the loss of its Ox activity for the first thiazoline 

20 ring.) Given the wide distribution of thiazole or oxazole rings in natural products (Ojima et 
alo. (1999) Proc. Natl. Acad. Sci. USA 96: 4256-4261; Li et al. (1996) Science 274: 1 188- 
1 193) exhibiting an impressive array of biological activities, the cloning of the blmlV.III 
genes and the identification of the Ox domain open many opportunities to define the 
mechanism for thiazole biosynthesis and to potentially synthesize novel thiazole containing 

25 molecules by engineering peptide biosynthesis. 
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Example 2 

Identification and characte risation of a type IT neptidvl carrier protein from the 
hlpnm ycin producer Strentomvce s verticillus ATCC 15003. 



Results. 



Tinning and sequence analysis of the blmlzme 

In our effort to clone the gene cluster responsible for BLM biosynthesis, we 
have determined 80 kb DNA sequence from Sv ATCC15003 (Fig. 8). Among the orfs 
identified within the blm gene cluster is the small orf of 273 base pairs (bp), bind, which is 
located approximately 4 kb upstream of the previously characterized blmAB resistance locus 
(Sugiyamae/a/. (1994) Gene 151: 11-16; Calcutt and Schmidt (1994) Gene 151: 17-21) 
(Fig. 8B). The blml gene encodes a protein of 90 amino acids with a molecular weight of 
9957 and a pi of 6.52 (Fig. 8C). Computer-assisted analysis (Altschul et al. (1997) Nucleic 
Acids Res. 25: 3389-3402) of the deduced amino acid sequence indicates that Blml is very 
similar to various PCP domains of NRPSs (ranging around 40% identity and 60% similarity, 
as shown in Figure 9). Like known PCP domains of NRPS, Blml has the highly conserved 
signature motif of LGGXS, within which the serine residue is the site for 4'- 
phosphopantetheinylation (Stachelhaus and Marahiel (1995) FEMS Microbiol. Lett. 125: 3- 
14; Marahiel et al. (1997) Chem. Rev. 97: 265 1-2673). The latter posttranslational 
modification is generally necessary for peptide biosynthesis; converting the apo-PCP into the 
functional holo-PCP (Marahiel et al. (1997) Chem. Rev. 97: 2651-2673; Walsh et al. (1997) 
Curr. Opin. Chem. Biol. 1: 309-315). Based on sequence comparison, Blml is most related 
to PCPs and not to other kinds of carrier proteins that also share the same LGGXS (SEQ ID 
NO:80) motif and undergo the same posttranslational 4'-phosphopantetheinylation [31], such 
as the E. coli acyl carrier protein (ACP) (Lambalot and Walsh (1995) J. Biol. Chem. 270: 
24658-24661), the ACP domain of type I PKS and the type II PKS ACP (Cox and Simpson 
(1997) FEBSLett. 405: 267-272; Carrcras et al. (1997) Biochemistry 36: 1 1757-1 1761), the 
ArCP domain (Gehring et al. (1998) Biochemistry 37: 2648-2659), and several nodulation 
related ACP-like proteins (Epple et al. (1998) J. Bacteriol. 180: 4950-4954; Spaink et al. 
(1991) Nature 354: 125-130). 
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Overexpression of A/w/ in E. coli 

To overexpress the blml gene in E. coli, we directly amplified the blml gene 
by PGR from the Sv. ATCC15003 genomic DNA and cloned it into the P QE-60 expression 
vector to give pBSl so that Blml could be produced as a protein with a native N-terrmnus 
and a His 6 -tag at its C-terminus. However, no production of the Blml protein was detected, 
as judged by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), upon 
introduction of pBSl into E. coli M15( P REP4) under the standard overexpression conditions 
recommended by the manufacturer (Qiagen). We reasoned that the small Blml protein wUh 
its native N-terminus may not be stable in the heterologous host, and hence moved the blml 
gene from pBSl into P ET-29a to yield the second overexpression construct of pBS2. In the 
latter construct, Blml should be produced as a fusion protein with 27 extra ammo add 
residues at its N-terminus, including an S-tag and the thrombin cleaving site, in addition to 
the His 6 -tag at its C-terminus. Introduction of P BS2 into E. coli BL21(DE-3) under the 
standard overexpression conditions recommended by the manufacturer (Novagen) indeed 
resulted in overproduction of Blml. In fact, the bulk of the soluble protein was the 
overproduced Blml, which was easily purified by affinity chromatography using N.-NTA 
resin (Qiagen). It is noteworthy that fusion of the additional 23 amino acids to the N- 
terminus of Blml as in pBS2 and change of the expression system from E. coli M15(pREP4) 
(pBSl) to E. coli BL21(DE-3)( P BS2) dramatically improved the express^ level of blml. 

t~ „u,n d>. ph™nhoD a nte' ^"vl"tinn of the Blml protein 

To establish Blml as a type II PCP, we tested if it could serve as a substrate 
for a PCP-specific 4'- PPTase. PPTases catalyze the posttranslational modification of an 
apo-PCP into a holo-PCP by transferring the 4'- P hos P hopantetheine moiety from co-enzyme 
A (CoA) to the conserved serine residue of PCP, and this reaction has been developed 
recently into a general method to prepare various holo-PCP, holo-ACP, or holo-ArCP from 
the corresponding apoproteins (Stachelhaus et al. (1996) Chem. Biol. 3: 913-9211; Gehring et 
al (1998) Biochemistry 37: 2648-2659; Gehring et al. (1998) Biochemistry 37: 1 1637- 
1 1650; Weinreb et al. (1998) Biochemistry 37: 1575-1584 ). Therefore, we decided to 
investigate the 4'-phos P hopantethein y lation of Blml under both in vivo (Ku et al. (1997) 
Chem Biol. 4: 203-207) and in vitro (Gehring et al. (1998) Biochemistry 37: 1 1637-1 1650; 
Lambalot et al. (1996) Chem. Biol 3: 923-936; Quadri et al. (1998) Biochemistry 37: 1585- 
1595) conditions. 
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To examine 4'-phosphopantetheinylation of Blml in vivo, we chose E. coli 
OG7001 as the expression host, which is a p-alanine auxotroph derived from E. coli 
BL21(DE3) by PI co-transduction of the panD mutation from E. coli SJ16 (Epple et al. 
(1998) J. Bacteriol. 180: 4950-4954). Upon introduction of pBS2 into E. coli OG7001, blml 
was exceptionally well expressed and the overproduced Blml protein was readily purified. 
However, high performance liquid chromatography (HPLC) analysis showed that the 
purified Blml was essentially in the apo-form (Fig. 10A), indicative that apo-Blml was a 
poor substrate for the E. coli endogenous PPTases, such as EntD and ACP synthase 
(Lambalot et al. (1996) Chan. Biol. 3: 923-936; Walsh et al. (1997) Curr. Opin. Chem. Biol. 
1 : 309-3 15; Lambalot and Walsh (1995) J. Biol. Chem. 270: 24658-24661). To circumvent 
the poor endogenous PPTase activity, we next co-expressed blml with the gsp gene, which 
was isolated from the gramicidin S producer Bacillus brevis, and encoded a PPTase that was 
known to 4 , -phosphopantetheinyIate heterologously produced PCPs in E. coli (Lambalot et 
al. (1996) Chem. Biol. 3: 923-936; Ku et al. (1997) Chem. Biol. 4: 203-207). We co- 
transformed pDPT-Gsp, in which the expression of the gsp gene was under the control of the 
T5/Lac promoter (Ku et al. (1997) Chem. Biol. 4: 203-207), and P BS2 into E. coli OG7001. 
Blml was again very well expressed and the resulting Blml protein was similarly purified. 
HPLC analysis showed that at least 60% of overproduced Blml was modified into the holo- 
Blml protein (Fig. 10B). (A PCP domain was similarly 4'-phosphopantetheinylated in vivo 
before by co-expressing gsp in E. coli using pDPT-Gsp, and approximately 80% of the PCP 
was produced in the holo-form (Ku et al. (1997) Chem. Biol. 4: 203-207). 

We next cultured E. coli OG7001(pBS2) and E. coli OG7001(pBS2/pDPT- 
Gsp) in the presence of [3- 3 H]-P-alanine, a known biosynthetic precursor of 4'- 
phosphopantetheine (Stachelhaus et al. (1996) Chem. Biol. 3: 913-921; Epple et al. (1998) J. 
Bacteriol. 180:4950-4954). Specific incorporation of [3- 3 H]-p-alanine into the 4'- 
phosphopantetheine moiety of holo-Blml was determined by autoradiographic analysis. 
Thus, while fermentation of E. coli OG7001(pBS2) in the presence of [3- 3 H]-P-alanine led 
to an IPTG-dependent overproduction of Blml, little of the resulting Blml protein was 3 H- 
labeled, indicative of being produced in the apo-form. In contrast, fermentation of E. coli 
OG7001(pBS2/pDPT-Gsp) in the presence of [3- 3 H]-p-alanine resulted in a significant 
increase of IPTG-dependent incorporation of the 3 H-label into the overproduced Blml 
protein, suggesting a specific incorporation of [3- 3 H]-P-alanine into holo-Blml, presumably 
in the 4 , -phosphopanthetheine moiety. There were several additional proteins that were also 
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weakly labeled by [3- 3 H]-p-alanine. However, both their expression and their incorporation 
by 3 H-label were independent from either IPTG induction or the presence of Gsp, hence 
these proteins were unrelated to Blml. (Similar background labeling was reported before for 
in vivo 4 , -phosphopanthetheinylation of other PCP (Epple et al. (1998) J. Bacteriol. 180: 
4950-4954)). We also purified the Blml protein from E. coli OG7001(pBS2/pDPT-Gsp) and 
demonstrated that it was the holo-Blml protein that was specifically associated with the 3 H- 
activity. Finally, we confirmed the identity of holo-Blml by subjecting the purified Blml 
protein to MALDI-Tof mass spectral analysis (Weinreb et al. (1998) Biochemistry 37: 1575- 
1584). Blml produced in the absence of the Gsp PPTase yielded a single peak with a 
molecular weight of 13,952, suggesting that the produced Blml protein is in the apo-form 
(calc, 13,949). In contrast, Blml produced in the presence of Gsp yielded two species with 
molecular weight of 13,969 and 14,303, respectively. While the species with the molecular 
weight of 13,969 represents apo-Blml, a molecular weight of 14,303 unambiguously 
confirmed the other protein as holo-Blml (calc, 14,289). The latter result indicated that the 
purified Blml consisted of both the apo- and holo-Blml proteins, in agreement with the 
HPLC analysis results (Fig. 10B). 

In vitro 4'-phosphopantetheinvlation of the Blm l protein 

To investigate 4'-phosphopantetheinylation of Blml in vitro, we chose the Sfp 
protein as the preferred PPTase, which had been isolated before from the surfactin producer 
Bacillus subtilis (Nakano et al. (1992) Mol. Gen. Genet. 232: 3 13-321). (Overexpression of 
gsp in E. coli using pDPT-Gsp resulted in predominantly an insoluble Gsp protein (Ku et al. 

(1997) Chem. Biol. 4: 203-207). The Sfp PPTase was overproduced in E. coli 

MV1 190(pUC8-Sfp) and purified to near homogeneity as described before (Quadri et al. 

(1998) Biochem., 37: 1585-1595; Nakano et al. (1992) Mol. Gen. Genet., 232: 313-321). 
Upon incubation of the purified apo-Blml with [ 3 H-pantetheine]-CoA in the presence of the 
Sfp PPTase, we examined the covalent incorporation of the [ 3 H-pantetheine]-4'- 
phosphopantetheine moiety from CoA into holo-Blml by autoradiographic analysis. Indeed, 
the apo-Blml was quantitatively labeled by [ 3 H-pantetheine]-CoA, and no labeling was 
observed in the absence of either the apo-Blml or the Sfp PPTase protein, demonstrating that 
the Sfp PPTase can recognize apo-Blml as a substrate and specifically transfer the 4'- 
phosphopantetheine group from CoA into holo-Blml. 
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In vitro aminoacvlation of Blml 

Once we established Blml as a type II PCP. that can be readily modified by 
PCP-specific PPTases into the holo-Blml protein, we tested if the holo-Blml could be 
aminoacylated in trans, requiring an A domain. Since Blml has no cognate A domain of its 
own, we turned our attention to another putative biosynthesis gene cluster we have cloned 
previously from Sv ATCC1 5003, which encodes at least four NRPS and one PKS modules. 
We have established that this gene cluster is not clustered with the blm locus and is unrelated 
to BLM biosynthesis. From this gene cluster, we amplified by PCR a 1579 bp fragment 
encoding an A domain, named Val-A, which we predicted to have a molecular weight of 
56,581 and a pi of 7.39. We cloned val-A into pET-28a to yield pBS3, in which Val-A 
would be produced as a fusion protein with a His 6 -tag at the N-terminus. Introduction of 
P BS3 into E. coli BL21(DE3) under the standard overexpression conditions recommended 
by the manufacturer (Novagen) resulted in good overproduction of Val-A, predominantly in 
soluble form, from which Val-A was purified by affinity chromatography using Ni-NTA 
resin The purified Val-A protein was active by the amino acid-dependent ATP-PPi 
exchange assay (Lee and Lipmann (1970) Method Emzymol. 43: 585-602; Ku et al. (1997) 
Chem. Biol. 4: 203-207). Among the 23 amino acids tested, Val-A specifically activated 
valine, an amino acid that is not required for BLM biosynthesis. 

To carry out the aminoacylation in trans, we incubated the purified holo-Blml 
and Val-A in vitro in the presence M' 4 C(U)] valine and ATP (Stachelhaus et al. (1996) 
Chem Biol. 3: 913-921; Weinreb et al. (1998) Biochemistry 37: 1575-1584). The 
aminoacylated holo-BlmI-I-[ ,4 C(U)]valine species was subjected to SDS-PAGE and specific 
attachment of L-[ 14 C(U)]valine to holo-Blml was determined by autoradiographic analysis. 
Remarkably, the holo-Blml was specifically labeled by I-[ 14 C(U)]valine in the presence of 
Val-A, indicative of the formation of the holo-Blml-S-valine thioester. The in trans 
aminoacylation between the holo-Blml and Val-A proteins appeared to be very specific. 
Neither incubation of M' 4 C(U)]valine with Val-A, the apo-Blml, or the holo-Blml protein 
alone, nor incubation of L-[ 14 C(U)]valine with the Val-A and apo-Blml proteins, resulted in 
the detection of l4 C-labeled Blml protein. 

Discussion. 

Nonribosomal peptides and polyketides are two distinct classes of natural 
products yet are assembled from amino acids and short carboxylic acids by NRPSs and 
PKSs respectively, in strikingly similar strategies (Cane et al. (1998) Science 282: 63-68). 
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These fascinating multifunctional enzyme complexes have been classified into two types 
based on their gene organization and enzyme architecture. Type I enzymes are 
multifunctional proteins consisting of domains for individual enzyme activities, and type II 
enzymes are multienzyme complexes consisting of discrete proteins that are largely 
monofunctional. While both type I and type II PKSs (Fig. 1 1 A and 1 1C) have been well 
characterized to account for the vast structural diversities found in polyketide biosynthesis 
(Hopwood (1 997) Chem Rev. 97: 2465-2497), all NRPSs studied so far are exclusively the 
type I modular enzymes (Fig. 1 IB) (Kleinkauf and von Dohrcn: H. (1996) Eur. J. Biochem. 
236: 335-351; Marahiel et al. (1997) Chem. Rev. 97: 2651-2673; von Dohren et al. (1997) 
Chem. Rev. 97: 2675-2705). It is very tempting to speculate the existence of a type II NRPS 
that, analogous to type II PKS (Shen and Hutchinson (1993) Science 262: 1535-1540; Bao et 
al. (1998) Biochemistry 37: 8132-8138; Carreras and Khosla (1998) Biochemistry 37: 2084- 
2088), should consist of discrete proteins possessing enzyme activities such as the A 
(Stachlhaus and Marahiel (1995) J. Biol. Chem. 270: 6163-6169), the PCP (Stein and Morris 

(1996) J. Biol. Chem. 271: 15428-15435), or the C (Stachlhaus et al. (1998)7. Biol. Chem. 
273: 22773-22781) domains of type I NRPSs (Fig. 1 ID). The fact that both the A 
(Stachlhaus and Marahiel (1995) J. Biol. Chem. 270: 6163-6169; Konz et al. (1997) Chem. 
Biol. 4: 927-937; Weinreb et al. (1998) Biochemistry 37: 1575-1584; Mootz and Marahiel 

(1997) J. Bacterial. 179: 6843-6850) and the PCP (Stachelhaus et al. (1996) Chem. Biol. 3: 
913-921; Weinreb et al. (1998) Biochemistry 37: 1575-15841; Pfeifer et al. (1995) 
Biochemistry 34: 7450-7459; Haese et al. (1994) J. Mol. Biol. 243: 1 16-122; Lambalot el al. 
(1996) Chem. Biol. 3: 923-936; Quadri et al. (1998) Biochemistry 37: 1585-1595; Gehring et 
al. (1996) Chem. Biol. 4: 17-24; Ku et al. (1997) Chem. Biol. 4: 203-207) domains of type I 
NRPSs can act as independent enzymes supports the hypothesis of a type II NRPS. 

We have now cloned and sequenced the blml gene, overproduced and 
characterized the Blml protein as a bona fide type II PCP, and demonstrated that holo-Blml 
can be aminoacylated by a completely unrelated A domain, providing for the first time 
genetic and biochemical evidence for a type II NRPS enzyme. We concluded Blml as a type 
II PCP based on the following criteria. (1) The deduced amino acid sequence of the blml 
gene is highly homologous to various PCP domains of known NRPSs, in particular at the 
signature motif of LGGXS within which the 4 , -phosphopantetheine prosthetic group is 
covalently attached to the serine residue (Marahiel et al. (1997) Chem. Rev. 97: 2651-2673; 
Stachelhaus and Marahiel (1995) FEMS Microbiol. Lett. 125: 3-14). While the current 
boundaries for a PCP domain in the literature were defined arbitrarily (Stachelhaus et al. 
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(1996) Chem. Biol 3: 913-921) and varied from one PCP to another, we can now re-define a 
PCP domain for the type I NRPS as a 90 amino acid peptide with approximately 45 amino 
acids, each flanking the essential serine residue in the LGGXS (SEQ ID NO:81) motif, in 
light of this discrete Blml type II PCP (Fig.9). (2) The blml gene has been successfully 
expressed in E. coli, and fusion of a short peptide to the N-terminus of Blml dramatically 
improved its overproduction efficiency. While we cannot exclude the effect of different 
systems on gene expression, i.e., E. coli M15(pREP4)(pBSl) vs. E. coli BL21(DE-3)(pBS2), 
we attribute the increase in expression efficiency to the stability of Blml as an N-terminal 
fusion protein instead of the otherwise labile Blml protein with its native N-terminus. Since 
Blml was produced predominantly in the apo-form in E. coli, apo-Blml apparently was not a 
substrate for the endogenous PPTases, such as EntD or ACP synthase, excluding Blml as an 
ArCP or ACP, respectively. EntD and ACP synthase are known to 4'- 
phosphopantetheinylate apo-ArCP and ACP, respectively, to their holo-forms efficiently 
(Lambalot et al. (1996) Chem. Biol. 3: 923-936; Walsh el al. (1997) Curr. Opin. Chem. Biol. 
1 : 309-3 15; Lambalot and Walsh (1995) J. Biol. Chem. 270: 24658-24661). (3) The apo- 
Blml protein serves as a substrate for PCP-specific PPTases that transfer the 4'- 
phosphopantetheine moiety from CoA to apo-Blml to yield the holo-Blml protein. We have 
demonstrated this posttranslational modification for Blml in vivo with the Gsp PPTase (Ku 
et al. (1997) Chem. Biol. 4: 203-207) and in vitro with the Sfp PPTase (Gehring et al. (1998) 
Biochemistry 37: 1 1637-1 1650; Lambalot et al. (1996) Chem. Biol. 3: 923-936; Quadri et al. 
(1998) Biochemistry 37: 1585-1595), both of which have been extensively used in preparing 
holo-PCPs. (4) The specific modification of apo-Blml by 4'-phosphopantethcinylation has 
been monitored by HPLC analysis (Fig. 10) (Weinreb et al. (1998) Biochemistry 37: 1575- 
1584) and by specific incorporation of [3- 3 H]-P-alanine in vivo (Stachelhaus et al. (1996) 
Chem. Biol. 3: 913-921; Ku et al. (1997) Chem. Biol. 4: 203-207; Epple et al. (1998) J. 
Bacteriol. 180: 4950-4954) and of [ 3 H-pantetheine]-CoA in vitro (Gehring etal. (1998) 
Biochemistry 37: 1 1637-1 1650; Lambalot et al. (1996) Chem. Biol. 3: 923-936; Quadri et al. 
(1998) Biochemistry 37: 1585-1595), respectively, into the 4'-phosphopantetheine moiety of 
the holo-Blml protein. The identity of Blml was finally confirmed by MALDI-Tof mass 
spectral analysis that determined the molecular weight for both the apo- and holo-Blml 
proteins. 

While individual domains of type I NRPSs can function independently and 
several A (Stachlhaus and Marahiel (1995) /. Biol. Chem. 270: 6163-6169; Konz et al. 



61 



WO 00/40704 PCT/USOO/00445 
(1997) Chem. Biol. 4: 927-937; Weinreb et al. (1998) Biochemistry 37: 1575-1584; Mootz 
and Marahiel (1997) / Bacterid. 179: 6843-6850) and PCP (Stachelhaus et al. (1996) 
Chem. Biol. 3: 913-921; Weinreb etal. {\m) Biochemistry 37: 1575-15841; Pfeifere/ a/. 
(1995) Biochemistry 34: 7450-7459; Haese et al. (1994) J. Mol. Biol. 243: 1 16-122; 
Lambalot et al. (1996) Chem. Biol. 3: 923-936; Quadri et al. (1998) Biochemistry 37: 1585- 
1595; Gehring et al (1996) Chem. Biol. 4: 17-24; Ku et al. (1997) Chem. Biol. 4: 203-207) 
domains have been overproduced, purified, and biochemically characterized, aminoacylation 
in trans has been successful only between PCPs and their cognate A domains (Stachelhaus et 
al. (1996) Chem. Biol. 3: 913-921; Weinreb et al. (1998) Biochemistry 37: 1575-1584). No 
aminoacylation between PCP and A domains from different NRPS modules has been 
observed These results led to the conclusion that there is a specific protein-protein 
recognition between the A domain and its cognate PCP (Weinreb et al. (1998) Biochemistry 
37: 1575-1584). Such domain-specific aminoacylation, in fact, should be beneficial in 
maintaining the fidelity of a type I NRPS by providing additional "gating" against 
misincorporation of non-specifically activated aminoacyl adenylate into the final peptide 
product Since a type II PCP such as Blml lacks its cognate A domain, we asked if Blml 
could be aminoacylated by an unrelated A domain of a type I NRPS. Although we have yet 
to determine the biochemical role of Blml in vivo, the fact that the blml gene is located m the 
middle of the blm gene cluster suggests that it may be involved in BLM biosynthesis. To 
avoid the ambiguity of selecting an A domain that may potentially interact with Blml m 
vivo, we preferred not to choose any A domain from the blm gene cluster to test if it could 
aminoacylate Blml in trans. We reasoned that an A domain that is unrelated to Blml should 
come from a gene cluster independent from BLM biosynthesis and should activate an ammo 
acid not required by BLM. We chose Val-A because it satisfied both requirements. Val-A is 
an A domain of a type I NRPS from a gene cluster we have cloned previously from Sv 
ATCC15003 that has proven to be unrelated to BLM biosynthesis, and it specifically 
activates valine among the 23 amino acids tested. Remarkably, Blml was efficiently 
aminoacylated by Val-A. The valine residue is specifically attached in a thioester linkage to 
the terminal -SH of the 4>-phosphopantetheine moiety of the holo-Blml protein, as evidenced 
by the fact that the apo-Blml was inactive under the identical conditions. 

Aminoacylation of holo-Blml by Val-A represents the first example in which 
an A domain aminoacylates a protein other than its cognate PCP domain. Since it has been 
suggested that an A domain of a type I NRPS can transfer the activated aminoacyl adenylate 
only to its cognate PCP domain because of the specific protein-protein recognition between 
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the two domains (Weinreb et al. (1998) Biochemistry 37: 1575-1584), the fact that Blml is 
aminoacylated by Val-A revealed a distinct feature of a type II PCP. It is very tempting to 
speculate that type II PCPs such as Blml may have broad intrinsic substrate specificity 
toward either the ammoacyl adenylate, the A domain, or both. In fact, the latter feature is 
5 reminiscent of the type II PKS ACPs, which have been shown to be interchangeable among 
different PKS complexes (Shen and Hutchinson (1993) Science 262: 1535-1540; Bao et al. 
(1998) Biochemistry 37: 8132-8138; Carreras and Khosla (1998) Biochemistry 37: 2084- 
2088) The biosynthesis of D-alanyl-hpoteichoic acid in Bacillus suntillis (Perego et al. 
(1995) J. Biol. Chem. 270: 15598-15606) and Lactobacillus casei (Debabov et al. (1996) 
10 178: 3869-3876) also involves a discrete ACP-like protein, the S-alanyl carrier protein, 
although the latter clearly is structurally and functionally different from PCPs. 

The results strongly suggest the existence of a type II NRPS. In fact, we have 
already identified within the Urn gene cluster two additional genes, Mn/Zand Wm*/(Fig. 
IB), which encode type II C proteins based on sequence analysis (see Example 1). 

15 Si gnificance. 

All NRPSs known to date are exclusively the type I modular enzymes that are 
multifunctional proteins consisting of domains, such as A (Stachlhaus and Marahiel (1995) J. 
Biol Chem 270: 6163-6169), PCP (Stachelhaus etal. (1996) Chem. Biol. 3: 913-921), and C 
(Stachlhaus et al. (1998) J. Biol. Chem. 273: 22773-22781), for individual enzyme actmt.es 
20 (Kleinkauf and von Dohren: H. (1996) Eu, J. Biochem. 236: 335-351; Marahiel et al. (1997) 
Chem Rev 97: 2651-2673; von Dohren et al. (1997) Chem. Rev. 97: 2675-2705), and 
control the structural variations of the resulting peptide products by the multiple-earner 
thiotcmplate mechanism (Cane et al. (1998) Science 282: 63-68; Stein and Morris (1996) J. 
Biol Chem 271- 15428-15435). While individual domains of type I NRPSs can function 
25 independently, aminoacylation in trans has been successful only between PCPs and their 
cognate A domains (Stachelhaus et al. (1996) Chem. Biol. 3: 913-921; Weinreb et al. (1998) 
Biochemistry 37: 1575-1584). We have cloned and sequenced the blml gene, overproduced 
and characterized the Blml protein as a bona fide type II PCP, and demonstrated that the 
holo-Blml can be aminoacylated by a completely unrelated A domain. Our results provided 
30 for the first time the genetic and biochemical evidence to support the hypothesis of a type II 
NRPS setting the stage for formulating new research concepts to study peptide biosynthesis. 
Genetic manipulation of type I NRPS has already been successful in generating novel 
peptides (Stachlhaus et al. (1995) Science 269: 69-72). An unprecedented type II NRPS 
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should shed new light in engineering NRPS proteins, greatly increasing our ability to access 
peptides with even greater structural diversities. 

Materials and methods 

general DNA m anipulations 

Plasmids preparation and DNA extraction were carried out by using 
commercial kits (Qiagen, Santa Clarita, CA), and all other manipulations were carried out 
according to standard methods (Sambrook et al. (1989) Molecular cloning: a laboratory 
manual: (2nd ed): Cold Spring Harbor Laboratory Press: Cold Spring Harbor: USA). E. coli 
strain DH5a was used as the host for general DNA propagations. 

rwpv prp^imi nfblml w F- mli and purification of the Blml protein 

The blml gene was amplified from Sv ATCC15003 by PCR using a forward 
primer of 5'-CCG CCCATGGGT OCT CCG CGT GGC GAG CGG ACC CGG CGC-3' 
(SEQ ID NO:82, the Ncol site is underlined) and a reverse primer of 3'-CCT AGATCT 
CCG GTC CCG CTC CCC CGT-5' (SEQ ID NO:83, the Bglll site is underlined). In order 
to create the Ncol site, the original starting sequence of "ATG AGC" has been changed to 
"ATG GGT", which resulted in the change of the second amino acid from serine to glycine. 
The first five codons of blml were also optimized for overexpression in E. coli. The PCR- 
amplified 0.3 kb Ncol-Bglll fragment was cloned into the similar sites of pQE-60 (Qiagen) 
to form pBSl. Digestion of pBSl with Ncol and /Mil and cloning the resulting 0.3 kb 
Ncol-Hindm fragment into the same sites of pET-29a (Novagen, Madison, WI) yielded 
pBS2. 

Expressions of blml in E. coli M15 (pREP4)(pBSl) and in E. coli BL-21(DE- 
3)(pBS2) and purification of the resulting Blml protein by affinity chromatography on Ni- 
NTA resin were carried out under the standard conditions recommended by Qiagen and 
Novagen, respectively. The incubation temperature was lowered to 30 °C to improve the 
solubility. The purification of Blml was monitored by SDS-PAGE on 15% gel. The final 
pure Blml protein was desalted on PD-10 column (Sephadex G-25, Pharmacia Biotech, 
Piscataway, NJ) into 50 mM sodium phosphate buffer, pH 7.8, containing 200 mM NaCl, 10 
mM MgCl 2 , 2 mM dithiothreitol (DTT), 1 mM EDTA, 10% glycerol, and stored at - 80 °C 
for in vitro assays. 
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HPi r analysis and M AT nT-Tnf mass spectral determination 

Samples of Blml (30-70 ug) purified from E. coli OG7001(pBS2) or E. coli 
OG7001( P BS2/pDPT-Gsp) were analyzed on a Nova-Pak C18 column (5mm x 10, Waters, 
Milford, MA) using a Rainin DMAX HPLC unit. The column was developed by a linear 
gradient' of 0-50% acetonitrile in 0.1% trifluoroacetic acid in 25 min, followed by additional 
5 min at 50 % acetonitrile, with a flow rate of 0.6 mtfmin and detection at 280 nm. MALDI- 
Tof mass spectral determination was performed on a Bruker Biflex IIII spectrometer at the 
Facility for Advanced Instrumentation of University of California, Davis. 

Tn vivn labeling of RlmT with [3- 3 Hl-B-alanine 

The p-alanine auxotroph E. coli strain OG7001 (Epple et al. (1998) J. 
Bacterial. 180: 4950-4954) was transformed with pBS2 and cultured under the same 
conditions as for E. coli BL21(DE3) (Novagen). For co-expression of blml with gsp, pDPT- 
Gsp (Ku et al. (1997) Chem. Biol. 4: 203-207) was similarly transformed into E. coli 
OG7001(pBS2) and the transformants were cultured in 2xYT (Debabov et al. (1996) 178: 
3869-3876) in the presence of kanamycin (25 ug/ml) and chloramphenicol (50 ug/ml). For 
in vivo labeling experiment, cells from 2 ml overnight culture of either E. coli 
OG7001(pBS2) or E. coli OG7001(pBS2/pDPT-Gsp) were harvested, washed with M9 
minimal medium (Debabov et al. (1996) 178: 3869-3876), and re-suspended in 2 ml of M9 
minimal medium. The latter were used as seed cultures (20 ul) to inoculate 1 ml M9 
medium with kanamycin (25 ug/ml) or kanamycin (25 ug/ml) and chloramphenicol (50 
ug/ml) for E coli OG7001(pBS2) or E. coli OG7001(pBS2/pDPT-Gsp), respectively. The 
resulting culture was incubated at 30 °C, 250 rpm to ODsoo™ 0.6 and to this was added 10 
uCi of [3- 3 H]-p-alanine (50 Ci/mmol, American Radiolabeled Chmicals Inc., St. Louis, MO) 
with or without IPTG (1 mM). Total proteins were resolved by SDS-PAGE on 15% gels 
that were Coomassie blue-stained. To determine 3 H-labeling of the overproduced holo-Blml 
protein, gels were soaked in Amplifier (Amersham, Arlington Heights, II) for 20 min, dried 
between two sheets of cellulose membrane (KOH Development Inc., Ann Arbor, MI), and 
visualized by autoradiography on X-ray films (Fuji Medical Systems, Stamford, CT). 

Tn vitro labeling of Blml with [ 3 H-pantp.theinel-CoA 

Expression of sfp in E. coli MV1 190(pUC8-Sfp), purification of the Sfp 
PPTase to homogeneity, and 4'-phosphopantctheinylation of apo-Blml by Sfp in vitro were 
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sh^d shed new light in engineering NRPS proteins, greatly increasing our ability to access 
peptides with even greater structural diversities. 

Materials ond methods 

Cipneral DN A manipulations 

5 Plasmids preparation and DNA extraction were carried out by using 

commercial kits (Qiagen, Santa Clarita, CA). and all other manipulations were carried out 
according to standard methods (Sambrook et al. (1989) Molecular cloning: a laboratory 
m anual: (2nd ed): Cold Spring Harbor Laboratory Press: Cold Spring Harbor: USA). E. coli 
strain DH5a was used as the host for general DNA propagations. 
10 ^.■ 0 v r ,.^ i nn nfblr - ' *■ F ™" ™* Purification of the Blml protein 

The blml gene was amplified from 5v ATCC 1 5003 by PCR using a forward 
primer of 5'-CCG CCCATGGGT OCT CCG COT GGC GAG CGG ACC CGG CGC-3' 
(SEQ ID NO:82, the Ncol site is underlined) and a reverse primer of 3'-CCT AGATCT 
CCG GTC CCG CTC CCC CGT-5' (SEQ ID NO:83, the Bglll site is underlined). In order 
15 to create the Ncol site, the original starting sequence of "ATG AGC" has been changed to 
"ATG GGT", which resulted in the change of the second amino acid from senne to glycine. 
The first five codons of blml were also optimized for overexpression in E. coli. The PCR- 
amplified 0.3 kb Ncol-BglU fragment was cloned into the similar sites of pQE-60 (Qiagen) 
to form pBS 1 . Digestion of pBS 1 with Ncol and Hindlll and cloning the resultmg 0.3 kb 
20 Ncol-Hindlll fragment into the same sites of P ET-29a (Novagen, Madison, WI) yielded 
pBS2. 

Expressions of blml in E. coli M15 (pREP4)(pBSl) and in E. coli BL-21(DE- 
3 )(P BS2) and purification of the resulting Blml protein by affinity chromatography on Ni- 

25 NTA resin were carried out under the standard conditions recommended by Qiagen and 
Novagen, respectively. The incubation temperature was lowered to 30 °C to improve the 
solubility. The purification of Blml was monitored by SDS-PAGE on 15% gel. The final 
pure Blml protein was desalted on PD-10 column (Sephadex G-25, Pharmacia Biotech, 
Piscataway, NJ) into 50 mM sodium phosphate buffer, P H 7.8, containing 200 mM NaCl, 10 

30 mM MgCh, 2 mM dithiothreitol (DTT), 1 mM EDTA, 10% glycerol, and stored at - 80 °C 
for in vitro assays. 
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HPi r analysis and MA ' "i.Tnf mass spectral determination 

Samples of Blml (30-70 ug) purified from E. coli OG7001(pBS2) or E. coli 
OG7001(pBS2/pDPT-Gsp) were analyzed on a Nova-Pak C18 column (5mm x 10, Waters, 
Milford, MA) using a Rainin DMAX HPLC unit. The column was developed by a linear 
gradient' of 0-50% acetonitrile in 0.1% trifluoroacetic acid in 25 min, followed by additional 
5 min at 50 % acetonitrile, with a flow rate of 0.6 ml/min and detection at 280 nm. MALDI- 
Tof mass spectral determination was performed on a Bruker Biflex Nil spectrometer at the 
Facility for Advanced Instrumentation of University of California, Davis. 

Tn vivn labeling of BlmT with r3- 3 Hl-B-alanine 

The p-alanine auxotroph E. coli strain OG7001 (Epple et al. (1998) J. 
Bacterial. 180: 4950-4954) was transformed with pBS2 and cultured under the same 
conditions as for £. co//BL21(DE3) (Novagen). For co-expression of blml with gsp, pDPT- 
Gsp (Ku et al. (1997) Chem. Biol. 4: 203-207) was similarly transformed into E. coli 
OG7001(pBS2) and the transformants were cultured in 2xYT (Debabov et al. (1996) 178: 
3869-3876) in the presence of kanamycin (25 ug/ml) and chloramphenicol (50 ug/ml). For 
in vivo labeling experiment, cells from 2 ml overnight culture of either E. coli 
OG700 l(pBS2) or E. coli OG7001(pBS2/pDPT-Gsp) were harvested, washed with M9 
minimal medium (Debabov et al. (1996) 178: 3869-3876), and re-suspended in 2 ml of M9 
minimal medium. The latter were used as seed cultures (20 ul) to inoculate 1 ml M9 
medium with kanamycin (25 ug/ml) or kanamycin (25 ug/ml) and chloramphenicol (50 
Ug/ml) for E. coli OG7001(pBS2) or E. coli OG7001(pBS2/pDPT-Gsp), respectively. The 
resulting culture was incubated at 30 °C, 250 rpm to OD 600 „ m 0.6 and to this was added 10 
uCi of [3- 3 H]-p-alanine (50 Ci/mmol, American Radiolabeled Chmicals Inc., St. Louis, MO) 
with or without IPTG (1 mM). Total proteins were resolved by SDS-PAGE on 15% gels 
that were Coomassie blue-stained. To determine 3 H-labeling of the overproduced holo-Blml 
protein, gels were soaked in Amplifier (Amersham, Arlington Heights, II) for 20 min, dried 
between two sheets of cellulose membrane (KOH Development Inc., Ann Arbor, MI), and 
visualized by autoradiography on X-ray films (Fuji Medical Systems, Stamford, CT). 

Tn vitro labeling nf Blml with f 3 H-nantethcinel-CoA 

Expression of sfp in E. coli MV1 190(pUC8-Sfp), purification of the Sfp 
PPTasc to homogeneity, and 4'-phosphopantctheinylation of apo-Blml by Sfp in vitro were 
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carried out essentially according to literature procedures (Quadri et al. (1998) Biochemistry 
3V 1585-1595; Nakano et al. (1992) Mol. Gen. Genet. 232: 313-321). A typical 100 \A 
assay solution contained 26 uM apo-Blml, 2.9 uM Sf P , 25 ,M [ 3 H-pantetheine]-CoA (0.9 
uCi, 40 Ci/mM), 10 mM MgCl 2 , and 5 mM DTT, in 75 mM MES/NaOAc buffer, P H 6.0. 
After 30 min incubation at 37 °C, the assays were stopped by addition of 5 pi of bovine 
serum albumin (0.2 mg/ml) and 0.9 ml of cold 10% (v/v) trichloroacetic acid (TCA). The 
precipitated proteins were collected by centrifugation at 14,000 rpm, 20 mm, 4 "C 
(Eppendorf 5415C centrifuge), washed with 10% TCA three times, and resolved by SDS- 
PAGE on 15% gel. The 3 H-activity incorporated into holo-Blml was similarly determmed 
by autoradiography as described for in vivo labeling of holo-Blm with [3->H]-P-alanine. 

„..^ r ^ inn nf val-A in E ™» qn ^ nnrificatinn and nssav of the Val-A 
protein 

The val-A fragment was amplified from Sv ATCC15003 by PCR using a 
fonvard primer of 5'-GGA ATT CCAJATGGG CAC CAC CGT CGC CGC G-3' (SEQ ID 
NO:84, the Ndel site is underlined), and a reverse primer of 3'-GGC AAGCTT GGG ACC 
GGG CGT GGA GCG C (SEQ ID NO:85, the Hindlll site is underlined). The PCR- 
amplified 1.6 kb Ndel-HindlU fragment was cloned in the similar sites of P ET-28a (Qiagen) 
to yield P BS3. Expression of val-A in E. coli BL-21(DE-3)( P BS3) and purification of the 
resulting Val-A protein by affinity chromatography on Ni-NTA resin were carried out under 
the standard conditions recommended by Novagen. 

Amino acid-dependent ATP-PPi assays were performed essentially according 
to the literature procedures (Ku et al. (1997) Chem. Biol. 4: 203-207; Lee and Lipmann 
(1970) Method Emzymol. 43: 585-602). A typical 100 M l assay solution contained 180 nM 
Val-A, 1 mM ATP, 0.1 mM PPi with 0.2 uCi of "P-PPi (1 1-75 Ci/mmol, NEN Life Science 
Products, Inc., Boston, MA), 1 mM MgCl 2 , 0. 1 mM EDTA, and 1 mM L-amino acid in 50 
mM sodium phosphate buffer, P H 7.8. After 30 min incubation at 30°C, the assays were 
stopped by addition of 0.9 ml of cold 1% (w/v) activated charcoal in 3% (v/v) perchloric 
acid. The precipitates were collected on glass fiber filters (2.4 cm, G-4, Fisher, Pittsburgh, 
PA) washed successively with 10 ml of 0.2 M sodium phosphate buffer, P H 8.0, 4 ml water, 
and 1 ml of ethanol, and dried in air. The filters were mixed with 7 ml of scintillation fluid 
(ScintiSafe Gel, Fisher) and counted on a Beckman LS-6800 scintillation counter to 
determine the radioactivity. 
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carried out essentially according to literature procedures (Quadri et al. (1998) Biochemistry 
37: 1585-1595; Nakano et al. (1992) Mol. Gen. Genet. 232: 313-321). A typical 100 jil 
assay solution contained 26 p.M apo-Blml, 2.9 uM Sfp, 25 p.M [ 3 H-pantetheine]-CoA (0.9 
HQ, 40 Ci/mM), 10 mM MgCl 2 , and 5 mM DTT, in 75 mM MES/NaOAc buffer, P H 6.0. 
After 30 min incubation at 37 °C, the assays were stopped by addition of 5 M l of bovine 
serum albumin (0.2 mg/ml) and 0.9 ml of cold 10% (v/v) trichloroacetic acid (TCA). The 
precipitated proteins were collected by centrifugation at 14,000 rpm, 20 min, 4 °C 
(Eppendorf 5415C centrifuge), washed with 10% TCA three times, and resolved by SDS- 
PAGE on 15% gel. The 3 H-activity incorporated into holo-Blml was similarly determined 
by autoradiography as described for in vivo labeling of holo-Blm with [3- 3 H]-P-alanine. 

rw^v prpssion of val-A in F. coli and purification and assay of the Val-A 
protein 

The val-A fragment was amplified from Sv ATCC15003 by PCR using a 
forward primer of 5'-GGA ATT C CATATG GG CAC CAC CGT CGC CGC G-3' (SEQ ID 
NO:84, the Ndel site is underlined), and a reverse primer of 3'-GGC AAG CTT GGG ACC 
GGG CGT GGA GCG C (SEQ ID NO:85, the Hindlll site is underlined). The PCR- 
amplified 1.6 kb Ndel-Hindlll fragment was cloned in the similar sites of pET-28a (Qiagen) 
to yield pBS3. Expression of val-A in E. coli BL-21(DE-3)(pBS3) and purification of the 
resulting Val-A protein by affinity chromatography on Ni-NTA resin were carried out under 
the standard conditions recommended by Novagen. 

Amino acid-dependent ATP-PPi assays were performed essentially according 
to the literature procedures (Ku et al. (1997) Chem. Biol. 4: 203-207; Lee and Lipmann 
(1970) Method Emzymol. 43: 585-602). Atypical 100 p.1 assay solution contained 180 nM 
Val-A, 1 mM ATP, 0. 1 mM PPi with 0.2 uCi of 32 P-PPi (1 1 -75 Ci/mmol, NEN Life Science 
Products, Inc., Boston, MA), 1 mM MgCl 2> 0.1 mM EDTA, and 1 mM L-amino acid in 50 
mM sodium phosphate buffer, P H 7.8. After 30 min incubation at 30°C, the assays were 
stopped by addition of 0.9 ml of cold 1% (w/v) activated charcoal in 3% (v/v) perchloric 
acid. The precipitates were collected on glass fiber filters (2.4 cm, G-4, Fisher, Pittsburgh, 
PA), washed successively with 10 ml of 0.2 M sodium phosphate buffer, pH 8.0, 4 ml water, 
and'l ml of ethanol, and dried in air. The filters were mixed with 7 ml of scintillation fluid 
(ScintiSafe Gel, Fisher) and counted on a Beckman LS-6800 scintillation counter to 
determine the radioactivity. 
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In vitro aminoacvlation of holo-Blml hv Val-A 

The aminoacylation of holo-Blml was carried out essentially according to 
literature methods (Stachelhaus et al. (1996) Chem. Biol. 3: 913-921; Weinreb * al. (1998) 
Biochemistry 37: 1575-1584). Atypical 100 ul assay solution contained 180nMVal-A, 1.5- 
2 8 uM apo- or holo-Blml, 35 nM L-[ 14 C(U)]-valine (283 mCi/mmol, NEN Life Science 
Products, Inc., Boston, MA), 5 mM ATP, 10 mM MgCl 2 , and 5 mM DTT in 75 mM Tris- 
HC1 buffer, pH 8.0. The reactions were started by the addition of ATP and, after incubation 
at 37 °C for 30 min, were stopped by addition of 0.9 ml of cold 7% (v/v) TCA. The 
precipitated proteins were collected by centrifugation at 14,000 rpm, 20 min, 4 °C 
(Eppendorf 5415C centrifuge) and resolved by SDS-PAGE on a 15% gel. The radioactivity 
incorporated into the holo-BlmI-L-[ ,4 C(U)]valine species was similarly determined by 
autoradiography as described for in vivo labeling of holo-Blml with [3- 3 H]-P-alanine. 

Example 3: 

r^ in g fl nri rharacterizati n n nf a nhosDhoDanfetheinvl transferase from the 
hinnmvrin- producing S treptamvces yrtirillus ATCC15003 
Multienzymes complexes exist for acyl group activation and transfer reactions 
in the biogenesis of fatty acids, the polyketide family of natural products (e.g. erythromycin, 
tetracycline), and almost all non-ribosomal peptides (e.g. vancomycin, cyclosporin, 
penicillin). All of these complexes contain one or more small proteins, -80-100 amino acids 
long, either as separate subunits or as integrated domains, that function as carrier proteins for 
the growing acyl chain (acyl-, peptidyl-, and aryl- carrier proteins, abbreviated as ACP, PCP, 
and ArCP). They are converted from inactive apo-forms to functional holo-forms by the 
covalent attachment of the 4'-phosphopantetheine moiety of coenzyme A to a conserved 
serine residue of the carrier-protein substrate. This essential post-translational modification 
is catalyzed by a family of enzymes known as phosphopantetheinyl transferases (PPTases) 
(Lambalotetal. Chem. Biol. (1996) 3:923-936; Walsh et al. Curr. Opin. Chem. Biol. 

(1997) 1:309-315). 

Research in the field of polyketide and non-ribosomal peptide biosynthesis 
has been hampered by the inability to fully modify and thus convert to the active form some 
polyketide synthases (PKS) and polypeptide synthetases (NRPS) when overproduced m 
heterologous hosts, presumably because the host PPTases are unable to effectively modify 
these overexpressed protein substrates. Our group is currently involved in the 
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characterization of the gene cluster responsible for the biosynthesis of the antitumor drug 
bleomycin in Streptomyces verticillus ATCC15003. As bleomycin synthetase is a hybrid 
NRPS/PKS enzyme, we decided to obtain a PPTase from the producing organism in order to 
use it in vitro or in vivo by coexpression with the synthetase genes to produce properly 
modified, active synthetases for our studies. 

Results and Discussion 

r^nin p nf fhe nttA V ™» from S. verticillus ATCC15003. 

The similarities among PPTases from different organisms are reduced to two 
short motifs separated by 40-45 residues: (V/I)G(V/I)D, and (F/W)(S/C/T)XKE(A/S)hhK 
(Lambalotetal. Chem. Biol (1996) 3:923-936; Walsh etal. Curr. Opin. Chem. Biol. 
(1997) 1:309-315). Our previous attempts to amplify PPTase sequences from S. verticillus 
chromosomal DNA using degenerate primers according to the two conserved motifs were 
unsuccessful (unpublished results), so we decided to narrow our target. PPTases have been 
classified in two groups, according to their specificity for the carrier-protein substrate: 
PPTases involved in polyketide/fatty acid biosynthesis use acyl carrier proteins (ACPs) as 
substrate, while those for non-ribosomal peptide biosynthesis use peptidyl carrier proteins 
(PCPs) or aryl carrier proteins (ArCPs) (Walsh etal. Curr. Opin. Chem. Biol. (1997) 
1:309-315). Several "NRPS-type" PPTase sequences were used to screen the databases to 
look for actinomycete homologues, and four proteins of unknown function were found: 
NshC from Streptomyces actuosus (Li et al. Gene (1990) 91:9-17), SC5A7. 23 from S. 
coelicolor (GenBank AL031 107), an unnamed protein from Streptomyces sp. strain TH1 
(Mori etal. J. Bacteriol. (1997) 179:5677-5683), and Rv2794c (later renamed PptT 
(Quadrietal. Chem. Biol. (1998) 5:631-645)) from Mycobacterium tuberculosis (GenBank 
AL008967) The alignment of the actinomycete sequences showed the two motifs conserved 
in all PPTases and an additional motif - the "THC" motif: PXWPXGX 2 GS(M/L)THCXGY 
(SEQ ID NO:86), located about 15 amino acids upstream of the (V/I)G(V/I)D motif (SEQ ID 
NO:87). The "THC" motif is not universally conserved in all PPTases, but it can be detected 
also in some non-actinomycete PPTases like EntD(Coderre etal. J. Gen. Microbiol. 
(1989) 135:3043-3055). Using a recently developed method of PCR primer design (the 
CODEHOP strategy (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) (Rose et al. 
Nucleic Acids Res. (1998) 26:1628-1635), two primers were designed around the typical C- 
terminal PPTase motif (primers KEA- 1 : 5'-T GCA GCA GAA CAG GAG GCK NYC CCA 
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NKG-3' (SEQ ID NO:88) and KEA-2: 5'-TG GGT CAG CGG GTA CCA NRC YTT RWA- 
3' (SEQ ID NO: 89, H=C+A, N=A+C+T+G, Y=C+T, K=G+T, R=A+G, W=T+A)), and one 
primer was designed from the "THC" motif (primer THC: 5'-C GGC ATG GTC GGC TCC 
HTN ACN CAY TG-3\ SEQ ID NO:90, H=C+A, N=A+C+T+G, Y=C+T, K=G+T, 
R=A+G, W=T+A); this motif is not universally conserved in PPTases of all organisms). 
Using S. verticillus chromosomal DNA as template, no amplification product was detected 
using the THC and the KEA-1 primers. The set of primers THC/KEA-2 successfully 
amplified a single band of the expected size (about 250 bp), which was gel-purified and 
cloned. Eight individual clones were sequenced, and all of them resulted to be identical 
(except differences due to primer utilization) and highly similar to the putative actinomycete 
PPTases. The PCR fragment was used as a probe to screen a S. verticillus genomic library 
by colony hybridization. Of the 10,000 colonies screened, 25 positive clones were 
identified, and then confirmed by Southern analysis to contain the same 4. 6-kb BamUl 
hybridizing band. The 4. 6-kb DNA fragment was subcloned, and the nucleotide sequence 
of a 1,761 -bp Bamm-Sah region was determined (SEQ ID NO. 3). 

Sequence analysis of the p vtA locus. 

The sequence of the 1,761 -bp Bamlil-San. fragment was analyzed for coding 
regions by using the CODONPREFERENCE and TESTCODE programs of the GCG 
package (Genetics Computer Group, Madison, Wisconsin). Two complete ORFs (pptA. 
or/3) and two incomplete ORFs (orfl. orf4) were identified within the sequenced region 
(Figure 13). The first ORF from left to right (designated orfl) starts out of the analyzed area 
and ends with a TGA codon at position 248 of the sequenced fragment. Comparison of the 
deduced product of orfl with proteins in databases showed similarities with Rv2795c from 
Mycobacterium tuberculosis (GenBank AL008967) and SC5A7. 22 from S. coelicolor 
(GenBank AL03 1 107), both of unknown function. The second ORF, pptA, contains the 
sequence amplified by PCR and used for the cloning of this locus. It comprises 741 
nucleotides, starting with a GTG codon (position 245) which is coupled to the stop codon of 
orfl, and ending with a TAA codon. The starting codon of pptA is preceded by a potential 
ribosomal binding site (RBS), GGGAG. The overall (76. 6%) and third codon position (93. 
9%) G+C contents and the codon usage of pptA are similar to those found in other 
Streptomyces genes, with the exception of the stop codon (TAA), which is most uncommon 
in this group of organisms (Wright etal. Gene(l992) 113:55-65). The pptA gene encodes a 
protein of 246 amino acids with a predicted molecular mass of 25,619 Da and a pi of 4. 76, 
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which contains the conserved PPT.se motifs. Databases searches 
significant similarities to the putative actinomyce PPTases (39.52%« 8 .6, A 
identity/similarity) and to confirmed bacterial PPTases such as EntD from E. col, 

X.denti.narUy, (Lambaiotetal. ** 0*6)3:923,3 

ORF, ^. is separated from,,* by an apparent,, noncoding DNA reg,o„ of 1 3 bp, and « 
is transcribed in opposite and convergent direction with respect to orf,.pp,A. The gene 
comprises 240 nucleotides, starting with an ATO codon (position ,358) and endtng - 
TGA The starting codon of orfl is preceded by the sequence OAAOO, a potentta, RBS. 
The deduced product of orfl encodes a protein of 79 amino acids with a predtcte mass of 
7 555 Da and a pi of 7. .7. The Orfi protein shows simiiarities to the N-.en.inareg.on of 
SC5H1. 35C a protein of unknown function from* — r (GenBank AL.49863 
Analysts ofOrf3 with«he SignaiP program (Nieisen e. ai. Pro^En^r. (1997) M 
predicts an N-termina, signal peptide which would be cleaved between res.dues and 28 
(ALA-DS), suggesting that the mature protein (52 amino acids, 5,099 Da, p. 4. 3 1) would be 
secreted, Between orp and o,fl there is an apparently noncoding region of 251 
The orfl gene is transcribed in opposite and divergent direction with respect to orB. I. saris 
with an ATG codon at position 1610, preceded by a potential RBS (GGAGG), and 
of the sequenced flagmen, The deduced protein product (50 amino acds, of the ,»c mplete 
orf 4 contains a potential NAD/FAD binding motif, GXGX.GX.GX.G (Scnrtton e, al. 
Nature (1990) 343:38-43), showing low similarities to diverse oxtdoreductases. 

,1 „ r r — »"* Moch^r-I rhw.<-ri™tion of PptA. 

I„ order to test itppiA actually encodes a functional PPTase, we decided to 
overproduce and purify the PptA protein, and assay its catalytic competence on putauve 

substrate proteins or domains. ThepptA coding sequence was amplified by PCR an ^^ " 
i „«o,heT 5 .promoter.basedpQE-70vec.or,yie,ding P ,asmidpQEPPT,,„suchawaytha.a 

hexahistidine tag would be added a. the Cerminus of the protein. Expression of the 
pQEPPT construct in E. coU M150,REP4) resulted in the overproduce of soluble Hts- 
Ugged PptA which was readily purified by affinity chromatography on Ni-NTA agarose 
unlnon-denamringconditions (FIGURE). Because,,-, belongs by sequence ^ 
, to the subfamily of PPTases involved in nonribosoma! peptide synthes,, we to. 
activity using two different apo-PCPs as protein substrates. The firs, one, B,m., has b 
previously characterized in our laboratory as a discrete peptidyl earner protetn or *pe I 
PCP, whose gene is found within the bleomycin-biosynthesis gene cluster of 5. 
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(Duetal. Chem. Biol. (1999) 6:507-517). For the second PCP substrate we used BlmX, a 
bimodular NRPS protein encoded in the same cluster (Fig. 2), as a source of a type I PCP, i. 
e. a PCP included in a multidomain NRPS. For the production of this type I PCP, we 
amplified by PCR a 1,898 bp fragment encoding the adenylation and PCP domains from the 
second module of BlmX. This DNA fragment was cloned into P MAL-c2x to yield 
P MAL1617, in which the type I PCP would be produced as a maltose-binding protein (MBP) 
fusion, MBlmX-2, with a predicted molecular mass of 108. 5 kDa. Introduction of 
pMAL1617 in E. coli TBI resulted in good overproduction of MBlmX-2, about 40% 
soluble, which was purified by affinity chromatography using amylose resin. To test the 
PPTase activity, we incubated the purified PptA with Blml and MBlmX-2 as putative protein 
substrates in the presence of ( 3 H)-(pantetheinyl)-CoASH, and the tritiated products were 
subjected to SDS electrophoresis and autoradiography. The well-characterized PPTase Sfp 
from B. subtilis, which exhibits a broad specificity for its protein substrate (Quadri et al. 
Biochemistry (1998) 37:1585-1595), was included as a positive control. In these 
experiments PptA exhibited a robust phosphopantetheinylation activity on both Blml and 
MBlmX-2. Having demonstrated that PptA does in fact have PPTase activity on both type I 
and type II PCP substrates from nonribosomal peptide synthetases, we then proceeded to test 
two different acyl-carrier proteins (ACPs) as potential substrates. The first one, BlmVIII, is 
a monomodular multidomain polyketide synthase (PKS) which is encoded in the bleomycin- 
biosynthesis gene cluster of S. verticillus (Fig. 2). BlmVIII contains an ACP domain at its 
C-terminus, that is a type I ACP. For the second ACP substrate we used TcmM, a type II 
acyl carrier protein involved in the biosynthesis of the aromatic polyketide tetracenomycin C 
inS. glaucescens {Shcnetzl. J. Bacteriol. (1992) 174:3818-3821; Bao et al. Biochemistry 
(1998) 37: 8132-8138). For the production of TcmM, its coding sequence was transferred 
from a construct previously made in P ET-22b (Gehring et al. Chem. Biol. (1997)4:17-24) 
into the pET-28a vector to yield pET28a-TcmM, in such a way that a hexahistidine tag 
should be added at both the N-terminus and the C-terminus of the protein. Plasmid pET28a- 
TcmM was introduced into E. coli BL21(DE3), and TcmM was easily purified by affinity 
chromatography using Ni-NTA resin. In vitro phosphopantetheinylation assays were 
performed as before, but using BlmVIII and TcmM as protein substrates, and PptA was able 
to posttranslationally modified both ACP substrates. 
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~. rr ^ r n. I. n„, rtns. c ---' ■- hl^mvcin-hlosynlhesis locus. 

Some bacterial PPTase genes have been found clustered, or close, to their 
respective "partner" NRPS genes-. en,D (en.erobac.in (Coderre e. al. J. Gen. Microtia,. 
(,989) 135:3043-3055)),* (surfactin (Cosmina e. al. Mo,. Microtia,. (1993)8:821- 
5 83,)) g sp (gramicidin (Borcher, e. al. J. Bacurio,. (1994) ,76:2458-2462)), U 

(bacit'racin (Gaidenko c, a,. «««»**. (1992) ,3-,9)>, **M («n (Huang e, a,. / 
Fe r m en, Bioen S . (,993)76:445-450)). To test the possib,e clus.en„g of p P M to to 
bleomycin-biosynthesis (M.) locus, PCR reactions were performed using the THC/KEA-2 
primers on several overlapping cosmid clones spanning UteWm locus plus 30^,0 kb 

,0 »ps,ream and downs.ream of its putative limits. No amplification product could be obtamed 
in these reactions, showing .ha, tefl* gene ,s no. clustered w,.h me bin, locus. 

Discussion 

It has been suggested that in organisms containing multiple 
phosphopantemeine-requiring pathways, each pathway has its own postradiational 
,5 modifyingactiv,,y(Walshe.a,. Curr. 0 P i, BioL (1997) ,309-3,5). Ourgroup 

has found tha, * .er,ici„us ATCC1 5003 comains several PKS and NRPS gene cluaers, one 
of .hem being responsible for bleomycin production (a hybrid NRPS/PKS system (Shen e. 
a, Bioor g . CHem. (1999, 27:155-171; Du e.al. d-. «W. (.999,6:507-517). Thts 
suggest m .he gene encoding .he PPTase for .he BLM NRPS cou,d be a,so clustered, or 
20 close, .0 .he NRPS genes. However, we have no. found this gene after sequencing almos 
.he whole M. NRPS locus. Because having this gene could be important for us m order to 
express tactiona, NRPS modules from .he M. clus.er, we decided .0 clone tire PPTase 
gene Additional if .he "one NRPS clus.er - one PPTase" hypothecs was tine, ,. seemed 
possible to use PPTase sequences as a new kind of probe «o clone novel NRPS clusters. 
25 we know that in S. ver-taVta mere are several NRPS locus (maybe four), so 

„e expeced several »PCP-.ype" PPTase, However we have amplified only one, and ,, does 
no, seem to be closely linked to any of the NRPS loci. Interestingly in the acttnomycete 
Mycooacenu* »*«**. whose genome is fully sequenced, there is only one PCP-type 
PPTase gene, which is no. c.us.ered witi, any of .he ,wo NRPS ,oci presen. in .his organtsm 
30 (Quadrie.al.Cto. Bio,. (1998)5:63,-645). These and omer indirec. evidences sugges. 
tha, ,he idea of cluster-specific PPTases is no, me general ru,= a. .1, bu, mos, probab.y me 
exception, especially in organisms containing multiple NRPS clusters. And there are strong 
evidences tha, a. leas, some PCP- W e PPTases can posttranslationally modify PCPs from 
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different clusters and even different organisms (Quadri et al, Chem. Biol. (1998) 5:631-645; 
Gehring et al, Biochemistry (1998) 37:1 1637-1 1650). It is most likely that there is only one 
PCP-type PPTase in S. verticillus and that its gene is not necessarily clustered to any of the 
NRPS loci. 

Biochemical characterization of the purified PptA protein confirmed not only 
its PPTase activity but also its broad specificity, comparable to that of Sfp. Different apo- 
PCPs (type I and type II) and a type-I apo-ACP from the bleomycin synthetase, and the type- 
II apo-ACP from the tetracenomycin PKS of Streptomyces glaucescens were efficiently used 
as substrates by PptA. These results suggest PptA as a good candidate for heterologous 
coexpression with NRPS and PKS genes to overproduce active holo-synthase enzymes. 

It is understood that the examples and embodiments described herein are for 
illustrative purposes only and that various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to be included within the spirit and purview of 
this application and scope of the appended claims. All publications, patents, and patent 
applications cited herein are hereby incorporated by reference in their entirety for all 
purposes. 
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CLAIMS 



What is claimed is: 

1 . An isolated nucleic acid comprising a nucleic acid selected from the 
group consisting of 

a nucleic acid encoding any one of Blm open reading frames (ORFs) 8 

through 41; 

a nucleic acid encoding a polypeptide encoded by any one of Blm 

open reading frames (ORFs) 8 through 41 ; and 

a nucleic acid amplified by polymerase chain reaction (PCR) using 
any one of the primer pairs identified in Table II and the nucleic acid of a bleomycin- 
producing organism as a template. 

2. The isolated nucleic acid of claim 1 , wherein said nucleic acid 
comprises a nucleic acid encoding at least two open reading frames selected from the group 
consisting of Blm open reading frames 8 through 41 . 

3 . The isolated nucleic acid of claim 1 , wherein said nucleic acid 
comprises a nucleic acid encoding at least three open reading frames selected from the group 
consisting of Blm open reading frames 8 through 41. 

4. The isolated nucleic acid of claim 1 , wherein said nucleic acid 
comprises a nucleic acid encoding a C domain lacking one or more His residues of the 
conserved HHxxxDG active site for transpeptidation. 

5. The isolated nucleic acid of claim 1, wherein said nucleic acid 
comprises a nucleic acid encoding a protein encoded by a gene selected from the group 
consisting oiblml, blmll, and blmXI. 

6. An isolated nucleic acid comprising a nucleic acid encoding a module 
comprising two or more catalytic domains of a protein encoded by a nucleic acid of a 
bleomycin gene cluster wherein said catalytic domains are selected from the group consisting 
of a condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) 
domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, 
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an oxidization domain (Ox), a ketoacyl synthase (KS) domain , an acetyl transferase (AT) 
domain, a ketoreductase (KR) domain, and a methyltransferase (MT) domain. 

7. The isolated nucleic acid of claim 6, wherein said nucleic acid 
comprises a nucleic acid encoding one or more proteins comprising a module selected from 

5 the group consisting of NRPS-0, NRPS-1, NRPS-2, NRPS-3, NRPS-4, NRPS-5, NRPS-6, 
NRPS-7, NRPS-7, NRPS-9, and PKS. 

8. The isolated nucleic acid of claim 7, wherein said nucleic acid 
comprises an open reading frame from SEQ ID NO: 1, SEQ ID NO: 2, or SEQ ID NO: 3. 

9. An isolated nucleic acid comprising a nucleic acid encoding a protein 
1 0 encoded by a gene from a BLM gene cluster. 

10. The nucleic acid of claim 9, wherein said nucleic acid comprises a 
nucleic acid encoding a protein encoded by a gene selected from the group consisting of 
bind, blntll, and blmXI. 

11. The nucleic acid of claim 9, wherein said nucleic acid comprises a 
15 nucleic acid encoding a protein encoded by a gene selected from the group consisting of 

blmlll, blmlV, blmV, blmVI, blmVII, blmlX, and blmX. 

12. The nucleic acid of claim 9, wherein said nucleic acid comprises a 
nucleic acid encoding a protein encoded by blmVIII. 

13. The nucleic acid of claim 9, wherein said nucleic acid comprises a 
20 nucleic acid selected from the group consisting of bind, blmll, and blmXI. 

14. The nucleic acid of claim 9, wherein said nucleic acid comprises a 
nucleic acid selected from the group consisting of blmlll blmlV, blmV, blmVI, blmVII, 
blmlX, and blmX. 

15. The nucleic acid of claim 9, wherein said nucleic acid comprises 

25 blmVIII. 

16. An isolated nucleic acid comprising a nucleic acid that encodes a 
protein comprising at least one catalytic domain selected from the group consisting of a 
condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) 
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domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-Iike domain, 
an oxidization domain (Ox), a ketoacyl synthase (KS) domain , an acetyl transferase (AT) 
domain, a ketoreductase (KR) domain, and a methyltransferase (MT) domain, and that 
hybridizes to a nucleic acid selected from the group consisting of or/8, or/9, or/10, orfll, 
5 orfl2, or/13, orfl4, orfl5, or/15, orfl6, orfl 7, orfl8, orfl9, or/20, or/21 , or/22, or/23, or/24, 
or/25, or/26, or/27, or/28, or/29, or/30, orfll, or/32, or/33, or/34, or/35, or/36, or/37, or/38, 
or/39, or/40, and or/41 under stringent conditions. 

17. The nucleic acid of claim 16, wherein said isolated nucleic acid 
comprises a nucleic acid encoding a module. 

10 18. The nucleic acid of claim 16, wherein said isolated nucleic acid 

comprises a nucleic acid encoding a BLM gene. 

19. An isolated nucleic acid comprising a nucleic acid selected from the 
group consisting of consisting of or/8, or/9, or/10, or/11, or/12, or/13, or/14, or/15, or/15, 
or/16, or/1 7, or/18, or/19, or/20, orfll, or/22, or/23, or/24, or/25, or/26, or/27, or/28, or/29, 

1 5 or/30, orfll, or/32, or/33, or/34, or/35, or/36, or/37, or/38, or/39, or/40, and or/41 , or an 
allelic variant thereof. 

20. The nucleic acid of claim 1 9, wherein said nucleic acid comprises a 
nucleic acid that is a single nucleotide polymorphism (SNP) of a nucleic acid selected from 
the group consisting of consisting of orfl, orfl, or/10, orfll, or/12, or/13, or/14, or/15, 

20 or/15, or/16, or/17, or/18, or/19, or/20, orfll, or/22, or/23, or/24, or/25, or/26, or/27, or/28, 
or/29, or/30, orfll, or/32, or/33, or/34, or/35, or/36, or/37, or/38, or/39, or/40, and orfll. 

21. An isolated gene cluster comprising open reading frames encoding 
polypeptides sufficient to direct the assembly of a bleomycin. 

22. An isolated multi-functional protein complex comprising both a 
25 polyketide synthase (PKS) and a peptide synthetase (NRPS). 

23 . An isolated nucleic acid encoding a multi-functional protein complex 
comprising both a polyketide synthase (PKS) and a peptide synthetase (NRPS). 
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24. An isolated polypeptide comprising a catalytic domain encoded by a 
nucleic acid of a bleomycin gene cluster wherein said nucleic acid comprises a nucleic acid 

selected from the group consisting of 

a nucleic acid encoding any one of Blm open reading frames (ORFs) 8 

5 through 41; and 

a nucleic acid amplified by polymerase chain reaction (PCR) using 
any one of the primer pairs identified in Table II. 

25. The polypeptide ofclaim 25, wherein said polypeptide comprises an 
enzymatic domain selected from the group consisting of a condensation (C) domain, an 

10 adcnylation (A) domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization 
domain (Cy), an acyl-carrier protein (ACP)-like domain, an oxidization domain (Ox), a 
ketoacyl synthase (KS) domain , an acetyl transferase (AT) domain, a ketoreductase (KR) 
domain, and a methyltransferase (MT) domain. 

26. The polypeptide claim 25, wherein the nucleic acid of a bleomycin 
15 gene cluster comprises a nucleic acid encoding at least two open reading frames selected 

from the group consisting of Blm open reading frames 8 through 41. 

27. The polypeptide claim 25, wherein said nucleic acid of a bleomycin 
gene cluster comprises a nucleic acid encoding at least three open reading frames selected 
from the group consisting of Blm open reading frames 8 through 41. 

20 28. The polypeptide claim 25, wherein said polypeptide comprises a C 

domain lacking one or more His residues of the conserved HHxxxDG active site for 
transpeptidation. 

29. The polypeptide claim 25, wherein said polypeptide is a polypeptide 
encoded by a gene selected from the group consisting of blml. blmll, and blmXI. 

2 5 30. An isolated polypeptide comprising a module comprising two or more 

catalytic domains of a protein encoded by a nucleic acid of a bleomycin gene cluster wherein 
said catalytic domains are selected from the group consisting of a condensation (C) domain, 
an adenylation (A) domain, a peptidyl carrier protein (PCP) domain, a 
condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, an 
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oxidization domain (Ox), a ketoacyl synthase (KS) domain , an acetyl transferase (AT) 
domain, a ketoreductase (KR) domain, and a methyltransferase (MT) domain. 

3 1 The polypeptide of claim 30, wherein said polypeptide comprises a 
mod ule selected from the group consisting of NRPS-0, NRPS-1, NRPS-2, NRPS-3, NRPS-4, 
NRPS-5, NRPS-6, NRPS-7, NRPS-7, NRPS-9, and PKS. 

32. An isolated polypeptide encoded by a gene from a BLM gene cluster. 

33. The polypeptide of claim 32, wherein polypeptide is encoded by a 
gene selected from the group consisting of blml, blmll, and blmXI. 

34. The polypeptide of claim 32, wherein said nucleic acid comprises a 
10 nucleic acid encoding a protein encoded by a gene selected from the group consisting of 

blmlll, blmlV. blmV, blmVI, blmVII, blmlX, andblmX. 

35. The polypeptide of claim 32, wherein polypeptide is encoded by 

blmVIII. 

36 An isolated polypeptide comprising a module wherein said module is 
15 specifically bound by an antibody that specifically binds to a BLM module selected from the 
group consisting of NRPS-0, NRPS-1, NRPS-2, NRPS-3, NRPS-4, NRPS-5, NRPS-6, 
NRPS-7, NRPS-7, NRPS-9, and PKS. 

37. The polypeptide of claim 36, wherein said polypeptide is specifically 
bound by an antibody that specifically binds to a polypepide encoded by a gene selected 

20 from the group consisting of of blml, blmll, blmXI, blmlll, blmlV, blmV, blmVI, blmVII, 
blmlX, blmX, and blmVIII. 

38. An isolated polypeptide comprising a polypeptide encoded an open 
reading frame of a nucleic acid selected from the group consisting of SEQ ID NO:l. SEQ ID 
NO:2, and SEQ ID NO:3, or an allelic variant thereof. 

39 The polypeptide of claim 38, wherein said nucleic acid comprises a 
single nucleotide polymorphism (SNP) of an open reading of a nucleic acid selected from the 
group consisting of SEQ IDNO:l, SEQ ID NO:2, and SEQ ID NO:3. 
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40. An expression vector comprising a nucleic acid of any one of claims 1 

through 23. 

41. A host cell transformed with an expression vector of claim 40. 

42 The host cell of claim 41, wherein said cell is transformed with an 
exogenous nucleic acid comprising a gene cluster encoding polypeptides sufficient to direct 
the assembly of a bleomycin or bleomycin analog. 

43. The cell of claim 41, wherein said cell is a bacterial cell. 

44. The cell of claim 43, wherein said cell is a Streptomyces cell. 

45. The cell of claim 41, wherein said cell is a eukaryotic cell. 

46 A method of chemically modifying a biological molecule, said method 
comprising contacting a biological molecule that is a substrate for a polypeptide encoded by 
one or more bleomycin biosynthesis gene cluster open reading frames with the polypeptide 
encoded by one or more bleomycin biosynthesis gene cluster open reading frames, whereby 
said polypeptide chemically modifies said biological molecule. 

47 The method of claim 46, wherein said method comprising contacting 
said biological molecule with at least two different polypeptides encoded by blm gene cluster 
open reading frames. 

48. The method of claim 46, wherein said method comprising contacting 
said biological molecule with at least three different polypeptides encoded by blm gene 
cluster open reading frames. 

49. The method of claim 46, wherein said contacting is in a host cell. 

50. The method of claim 49, wherein said host cell is a bacterium. 

51. The method of claim 46, wherein said contacting ex vivo. 

52. The method of claim 46, wherein said biological molecule is an 
endogenous metabolite produced by said host cell. 
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53. The method of claim 46, wherein said biological molecule is an 
exogenous supplied metabolite. 

54. The method of claim 46, wherein said host cell is a eukaryotic cell. 

55. The method of claim 54, wherein said eukaryotic cell is selected from 
5 the group consisting of a mammalian cell, a yeast cell, a plant cell, a fungal cell, and an 

insect cell. 

56. The method of claim 46, wherein said biological molecule is an amino 
acid and said polypeptide is a peptide synthetase. 

57. The method of claim 46, wherein said polypeptide is a methyl 

10 transferase. 

58. A method of coupling a first amino acid to a second amino acid, said 
method comprising contacting the first and second amino acid with a recombinantly 
expressed bleomycin nonribosomal peptide synthetase (NRPS). 

59. The method of claim 64, wherein said NRPS is selected from the 
15 group consisting of NRPS-5, NRPS-4, NRPS-3, NRPS-9, NRPS-8, and NRPS-7. 

60. The method of claim 64, wherein said NRPS is selected from the 
group consisting of NRPS-6, NRPS-2, NRPS-1, and NRPS-0. 

61 . The method of claim 64, wherein said contacting is in a host cell. 

62. A method of coupling a first fatty acid to a second fatty acid, said 
20 method comprising contacting the first and second fatty acids with a recombinantly 

expressed bleomycin polyketide synthase (PKS). 

63. The method of claim 62, said contacting is in a host cell. 

64. A method of producing a bleomycin or bleomycin analog, said method 

comprising: 

25 providing a cell transformed with an exogenous nucleic acid 

comprising a bleomycin gene cluster encoding polypeptides sufficient to direct the assembly 
of said bleomycin or bleomycin analog; 
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culturing the cell under conditions permitting the biosynthesis of 

bleomycin or bleomycin analog; and 

isolating said bleomycin or bleomycin analog from said cell. 

65. An isolated nucleic acid comprising a nucleic acid encoding a 
phosphopantetheinyl transferase said nucleic acid encoding a phosphopantetheinyl 
transferase being selected from the group consisting of: 

a nucleic acid encoding the protein encoded by the nucleic acid of 

SEQ ID NO:3; 

a nucleic acid amplified by polymerase chain reaction (PCR) using 
primers that specifically amplify ORF 41 (primers: SEQ ID NO:71 and SEQ ID NO:72) and 
Streptomyces nucleic acid as a template; 

a nucleic acid encoding a polypeptide having phosphopantetheinyl 
transferase activity where said nucleic acid specifically hybridizes to the nucleic acid of SEQ 
ID NO: 3 under stringent conditions. 

66. The nucleic acid of claim 65, said nucleic acid comprising a nucleic 
acid ofSEQIDNO:3. 

67. A polypeptide comprising a phosphopantetheinyl transferase encoded 
by SEQ ID NO:3. 

68. A vector comprising the nucleic acid of claim 66. 

69. A cell transfected with the vector of claim 68. 

70. A method of converting an apo-carrier protein to a holo-carrier protein 
comprising reacting said apo-carrier protein with a recombinant phosphopantetheinyl 
transferase encoded by SEQ ID NO:3 and coenzyme A thereby producing a holo-carrier 
protein. 

71. A cell comprising a modified bleomycin gene cluster nucleic acid, 
said cell producing elevated amounts of bleomycin as compared to the wild type cell. 

72. The cell of claim 71, wherein said cell overexpresses a resistance gene 
from the bleomycin bene cluster. 
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73 The cell of claim 72, wherein said resistance gene is a gene listed m 

Table III. 
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QTrnnttNCE LISTING 



SEQ ID NO: 1 BLM gene cluster OKFS 30 through 8 

cel 18 660 are translated on the reverse strand and on , 
(note orf 31-40 on sequence 1-18660 are 

llM1 0 ™™»c fT ™co^ 

E » E V S A . ^ s s L ^ , „ . 0 x , V R S «.r<a»> 

„„ l^^^^r^YTf^ 15320 

iml ccl™^"^^ 1,3,0 

„,„ z^^r^^^^^^^ 19,40 

1W1 Jcll^-wr^^ 15500 

„ M1 J™^,™^^ 19560 



WO 00/40704 PCT/US00/00445 

20040 
20100 
20160 



19981 CCGTCGCGG^ 

20041 acItctc^ 

! CcllclcTCAACGT^ 

rTAVNVRTOAo 

, CCGCG f CCTC^^ 

ccgcgLc^ 

GGGCGATGGCCCGCGCCA^ 

ccIcgggccgtggcc^^^ 

a CCGGCGACCACAGCGCCGAGGAACA^^ 

GDHSAEEHALLAbHX 

(1 TCGACGGTCCG^ 

clcGTGCGCGTGGTGAT^ 

ccgggLcgccccogccagcgccg^ 

207S1 GG^GCC^ 

2082 x acctgLcLgccggtg^^ 

caccgtgcgctccgagtgcccgacgatgaggagccgaccgaggtcctcctccgtcacccg 

pcapsarr* 
gacaccc^cggttgcgccgccccatgccggcggtcccicctgacggcccgtccgcggctt 

gaggcggcggtggacggcctgccgccgccggcctcgggctgatcggcgtgatcaccgccc 

atgcgcosgtc^cgcccgcggcatcgtc^cgggactotgttcccggccaccgc^ 

ccggcctcgcgctgkcgWtgccgcggtc^ 

ggcctgtgcttcttcccgcccgtccggcgggtggcgccgcgccggcggtgacagggaaat 

ATGACCGGAACTGGGATGcicGCGTCCACTCGGGTGTGTTTAAGTGCCACGGGGGCTTCC 

gacggcgcgtcgcgcgccggcggttcgcccgatgatggtcgtgcggcgctgtgagccggg 

GAGCCTATGGCACAGGACCTGAACGACTGGATCGAGGACGAGGTCGTCCCTTACGAGGAG 

MA QDLNDWlEDEVVf 

AAGCCTCTCGAATGGATC^^^ 

GTCGATCACACCTACTTCTTCTC^CCGGCCGATGGCGCGATCGTCTACCAG^ 
VDHTVFFSPADGAIViW 

GATCCCCAGGAGTCGATCATCGACATCAAGGGGAAGCCGTACTCGCTGGCCGCCG^ 

D pQESlID IKL, ^ rx 



2016 
20221 
20281 
20341 
20401 

2046 

20521 

20581 C( 

20641 

20701 



20881 

20941 
21001 
21061 
21121 
21181 
21241 
21301 



20280 

20340 

20400 

20460 

20520 

20580 

20640 

20700 

20760 

20820 

20880 

20940 

21000 
21060 
21120 
21180 
21240 
21300 
21360 



21421 
21481 
21541 



PCT/USOO/00445 

WO 00/40704 

21660 



704 

CGTGACGAATCGTTCGGTCACCGGTG^^ 
RDESFGHRCLVIGlt" 

GTGCACATC^C^ 

gggacgt^ 

s4444^ 21840 

CCgLcJgG^ 21900 
LcCCGTTC^ 21960 

21961 cgcLLIg^^ 22020 

22021 gJJccLL 22080 
220 81 cgttgaagagcgcgtacgaagcgatggcgaactggagggacacagcgtgggtttccgtcg 

(orf2 

AGCGCAGAGGGCCGGTGGGCCG^ 22200 

a q r a g g p 



21841 
21901 



22141 
2220 
22261 



AQRAGGPGAGK**- — 

" CGGGCCGTCGGCGCCG^^ 

;1 GGTGTTTGACAAGCT^^ 

VFDKLVDGDLSHQi'Ax 

22321 CGGCCCGCTGAACACCGCCGCCCTGCGGATGGCCTACGCCCGGCTGGTGCGGCGCCA 

GPLNTAALRMAi^^ 



CGGCCCGCTGAACACCGCCGCCCTGOGGATC^ 

GTGCCTGCGCACCCGCTTCCCCGTGATCGACGGGGAGCCCGTG^GGTGATCGAGGGCAT 
CLRTRFPVIDGEPVQVir. 

CGGGAAAGCAGCGGGGGGCCCGCTGCCGCTCATCGATCTGCGCCACCTCCCGGAGGCGCT 
GKAAGGPLPLIDLRH^ 

TCGCGCGCGCGAGATCGCGAGGATCCGCGAGGAGACGCTGTCCACGCCGGTCCCCTTCGA 
RAREIARIREETLSTt' 

CAAGCGGCCGCCCGTCCGCGTGGCGCTGATCCGGGCGGCGCCCGAGGAGCACCTCTTCCT 

KRPPVRVALlRAAf^ 
CGTCGGC^TCCCGCACATCACCGCGGACCTGTGGTCCG^CGACCCTGCTCAACGACGAGCT 
VGIPHITADLWSAI^ 

CATGGCGCACTACAGGGCGGGGGCCGAGMGACTCCCTCCCGGG 

gIta^ 

cIggccggacggtggcgggcgcggctggacgggctgtccgccgt^ 

EAGRWRARLDGLSAVEbf 

22861 CCGGCCCCGCCCCGCCK^^ 

RPRPAGRRRDCFLIGU 

CGAACTGAGCGACCGGCTGCGCGCCTTGGCACGCACCGCC^ 

ELSDRLR ALART 

aCTGCTGGCGGCGTTCCACTGGCTGGTGGGGCGGATGTCGGG^^ 
LLAAFHWLVGRMSGAOK 

CACCTCGCTCGTGGCCGCCCGGCACGGCAGCGCGCTACAGGGGATGACCCGCCCGTTCTC 



22381 
22441 
22501 
22561 
22621 
22681 
22741 
22801 



22921 
22981 G< 
23041 



22260 
22320 
22380 
22440 
22500 
22560 
22620 
22680 
22740 
22800 
22860 
22920 
22980 
23040 
23100 
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TSLVAARHGSAVQGMTGPFS 

23X01 GGACTACCTGGCCC™ 23160 

23161 CCGCGTACGCGAC^ 23220 

23221 CCTCG^GTCATGGACCCCG^^ 23280 

2328 1 CAACCTCCACAACATCCCTCCCGCG^ 23340 

233 41 GGTGAACCCGGAGG^GGACGACGGGGAGAGCGGCGACGGGGAGTACG 23400 

23 401 CGACCTGACCTTCGACGT^ 23460 
DLTFDVYDYGTGHM f c uv 

23 461 CGACCGGCGGCTGGCCGATC^ 23520 

2352 1 GCTCCGTGCGGTCGTCGCCGACCCCGGCGTGCGCCTGTCCGCCCTCGGCACCCTGCTGTC 23580 
LRAVVADPGVRLSALO i 1j 

23581 CCTGCCGCGACCGCCGTCCGCCACGTCCTTCGGCGGCCGGGAGATCGACGTCCGGCGCGT 23640 

LPRPP SATSFGG 

23641 CGAACGCGAGTTGGCG^C^^^ ^3700 

23701 GCGCCTGGCCACCGGGCTGCGCGTACGGGAACTGGTCGCCTACTGCGCCGTCGAGGGCAC 23760 
RLATGLRVRELVAYCAvnu 

23 761 GCC^CGTCCGAACGC^CCCACGACATCCGCGGCCGCCTGCGGGAGCGCCTGCCC^ 23820 
pRPNAAHDIRGRLRERbfuu 

23821 CTGGGTGCCGACCGTGTTCGTCGAGCGCCCGCCGGA 2 3 880 

23881 CCGGGCGGCGGGCGGCG^ 23940 

23941 TCCCGAGGAGGGCCGGCCCCCCTCGGACCCGTCCGAGCGGCGGCTGGCCGCGCTCTGGGC 24000 
pEEGRPPSDPSERRLAAi." 

24001 CGAGATCCTG^ 24060 

2406! CGATAAGGACGOTCTCOS^^ 24120 

2412! cIgCCGA^ 24180 

24181 ACGGAGGGTGTAACGCGCAATGAGTGAGTGGTAGGGTCGGAATCGAACGGCACTGATCGG 24240 
R R V * 

24241 CAATCTTTTTOGTCAGCTGTTCC^ATAT^CCGGGGCGCGTC<3GCGCTCCCTCGACCAAG 24300 

24301 GGCGTACGCGGATAAGCGTGCGCCGCCCCACGGCTGCGTCTCGACGCCTTCATCGGCGCG 24360 

24361 TCGGACACTTCGCGGTGCCAGTCGGCACGCTCAGAGATCAGTGGAATGCCTCGGTGTGCC 24420 

24421 CGAGgWtCAGTACTGCTGTCCACACAACG^C^^^ 2448 

RGALSTAVHTTRQGSWN vrac 

2 4481 ACGGCGAATTCCGGCTATCGGGTCTCACCTCAGCAG^ 2454 
TANSGYRVSPQQRHLWAML.1 

24541 CGCGGGCGGGACGGCGGGCGACGTGCGTTCACCCTVGTCCGCCGTGGTGGTCGACC^ 24600 
RGRDGGRRAFTQ^SAVVVDRS 



f26) 
0 

0 
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AARHGSAVQGMTGPFS 

2310 i gLL^^^ "160 

„X.X CCGC^CGCGAC^^ ^ 

23221 c^J™ 2328 ° 

23281 C™ T CA T C^ 2 " 40 

2334 i ggtgLccggagggg^ 23400 

23401 IcLaccttcg^^ 23460 

2346 1 CGAC^ 23520 

23521 0^44^ 23580 

235 81 CcJcGCGACCGCCGTCCG^^ 23640 

23S41 cgaacgc^ 23700 

23701 «lcL^ 23760 

2376 i gccgcgt^ 23820 
2382 i ctgcIg^ 23880 

23881 cLj^ 23940 

2394 i IcI^/cg™ 24000 

240 01 CGAGATCCJG^^ 24060 

24061 cgatLgacgccctccg^^ 24120 
24121 cLgccgacttcctc^ 24180 
acggagggtgtaacgcgcaatgag^^^ 24240 

R R V * 

2424 1 CAATCTTTTCGGTCAGCTG^CCGGATATTCCGGGGCGCGTCGGCGCTCCC^ 2 43 00 
243 01 GGCGTACGCGGATAAGCGTGCGCCGCCCCACGGCTGCGTCTCGACGCCTTCATCGGCGCG 24360 
2436 1 TCGGACACTTCGCGGTGCCAGTCGGCACGCTCAGAGATCAGTGGAATGCCTCGGTGTGCC 

(orf26) 



WO 00740704 PCT/USOO/00445 

246 oi ctggacgc^ 24660 
24661 LLaccttcaccggt^^ 

2472 1 G^CGAGCAGCCGCTG^^ 

24781 gaactg^ 24840 



24720 
24780 



2484 1 TICOBGOCC^ 



GCCCTGGTCGCGGACCGGCTCTCCCTCCGGCTGCTGGCCGGGCAGATCCTCGCGGCGTAC 

alvadrls lrll 

4961 AGCGGGGAGACCGTGTCCCCCGATGGCCCGCCGCCCTTGCAGTACGCCGACTTCGCCG^ 

SGETVSPDOfffu* 

5021 TGGCACCACGACCTGCTCACCGCCGAGGA^ 



2 S081 ^CACCGCCAC^ 

25141 GGTCOGTGGCGGGCGCGGGAGTGGGAACT 

25201 G T CGCCGGG r C^ 

25251 GTCTC^CGGCTCGCCGGCGAG^^ 

25321 CACCCCGAACTCCGCAC^^ 

HPELRTAlGAFbKn^ 

25 3S1 ATCCGTCACGAGACGGCGnCGCGGAAT^ 

2544 1 GGCGAGGAACTCCTCGACCATTCCGACCCGGAACTGCTCGGCAGCCT 
2 SS01 G^GGGCCCTGC^^ 
255 S1 ATCACC^ACCAC^ 

2562 1 CGACGCGACGGCGCCCGGCTCCGCATGGAACTGGGATACGACGAGGGCCGTATCGACGAG 
RRDGARLRM ELGY 

25 681 *«mCCCM^^ 

257 41 CCCGAGGGCCCGGTCGGCGACATCCGCATGCTGTCGGAOT 

pEGPVGDlRMLSDbiA^u 

2 5801 GAAGCGGGGCTGGGCCCCCGCGTGG^^ 

EAGLGPRVELPGKAvnE, 

25861 GAGCAGGCCGCGCGCACCCCCGGGGCGGTCGCGGTCAGCGCGGGCGAGGACGCCCTCACG 
EQAARTPGAVAV&Aoc 

25921 TACGCCGAACTCGACGAGCGGTCCAACCGCCTGGCACAC^CCTGACCGGGCTCGGGGTG 

YAELDERSNRLAHHL.1^" 

25981 ACACCCGG^ 

2604 1 CTCGG CGTG CTCAAGG CGGGTGG CG CCTTCGTCC^GTCG ACGTGGGCTT CCCCCG CAAA 



24900 

24960 

25020 

25080 

25140 

25200 

25260 

25320 

25380 

25440 

25500 

25560 

25620 

25680 

25740 

25800 

25860 

25920 

25980 

26040 

26100 



PCT/US00/00445 

WO 00/40704 



GGAFVPVDVGFPRK 

„„ ™=L^^ 26160 



26161 



2622l IZ^^fOo^^co^^c^^^ »- 

M401 L^^Tr 3 ^^™^*"™ 26460 

26161 "JJccU^c^^^^-TT^ 26520 

MM1 L^4^»r^ C ^ C "^^ C ^ 26,00 

„ i44444 K rfTTT o ?^? sG ?TTTT c 26020 
lij444t^^ c "^'^™ ctc?sc 26S " 

x Jcll™!^^ 2,000 



26881 



2694] 



R F Ij P 



2 ,ooi mjT^TfJ , v I. O E 

„.„ »™«™»^ 27120 

2 , 1S1 --K-^f-™? 0 !^^ 2 "" 

rai ^4444^??^^^^ 2,300 

2 , 361 l c444444TTrf^r!T^ 2,<2 ° 
SM1 14444444??^^?^??^ 27480 

D NYFVLGGDSIRt» vl 



WO 00/40704 PCMJS00/00445 

2754 1 CAGGCCCGCGGGGTCGAGGTCACCGTGGCCGACCTGCACCGGCACCCCACCGTCCGGGCC 27600 
QARGVEVTVADLHRHPTVRA 

27601 TGCGCCGCGCACCTGGACGCCCGCGAGGACCTGCCGCGGACG.CCCGTCACCGAACCCTTC 27660 



H h D A R 



27720 
27780 
27840 



27661 GCG CTG AT CTCCG CCG AGG ACCGGGCG CTGGTG CCGG AC G ACGTCG AGG ACG C CTTCCCG 
ALISAEDRALVPDDVEDAFP 

27721 CTGAACCTGCTCCAGGAAGGCATGATCTTCCACCGCGACTTCGCGGCG^GTCGGCCGTC 
LNLLQEGMIFHRDFAAKSAV 

27781 TACCACGCCATCGCGTCCGTGCGGCTGCGCGCCCCGTTCGACCTCGCCGTGCTGCGGATG 
Y HAIASVRLRAPFDLAVLRM 

27841 GTCGTGCGCCAGCTCGTCGAGCGGCACCCGATGCTGCGCACCTCC^CGACATGAGCCGC 27900 
VVRQLVERHPMLRTSFDMSR 

27901 TTCAGCCGCCCGCTGCAACTGGTGCACCGCGAGTTCGCCGATCCGCTGCACTACGAGGAC 27960 

FSRPLQ LVHREFADPL 
27961 CTGCGCGGCAGGAGCGCCGAGGAGCAGGACGCCCGCGTCGAGGAGTGGATCGAGCGGGAG 28020 
• RGRSAEEQDARVEEWIER E 



28080 
28140 



28021 AAGGAACGCGGCTTCGAGCTGCACGAGTTCCCGCTGATCCGCTTCATGGCGCAGCGCCTG 
KERGFELHEFPLIRFMAQRL 

28081 GAGGACGACGTCTTCCAGTTCACCTACGGCTTCCACCACGAGATCGTGGACGGCTGGAGC 
EDDVFQFTYGFHHEIVDGWS 

28141 GAAGCCCTGATGATCACCGAGCTGTTCAGCCACTACTTCTCGGTGATCTACGACGAGCCG 28200 
EALMITELFSHYFSVIYDEP 



28260 



28201 ATCGCGATCAAGCCACCCACCGCCGGCATGCGCGACGCCGTCGCCCTGGAGCTGGAGGCC 
IAIKPPTAGMRDAVALELEA 

28261 CTCGCGGACCGCCGCAACtAcGAGTTCTGGGACTCCTACCTCGCCGACGCCACCCTGATG 28320 
LADRRNYEFWDSYLADATLM 

28321 cggctgcccAggcccggcaccggaccccgggccgacaagggcgaccgggacatcacccgc 

n y r>r>or, TGPRADKGDRDITK 



28380 
28440 
28500 



28381 ATCGCCGTCCCCGTCCCCACCGAACTCTCCGACGGCCTCAAGCGGGTCGCCGCCACCCAC 
IAVPVPTELSDGLKRVAA1 n 

28441 G C CGTC CCGCTG AAG ACCGTG CTCCTGGCCG CG CACATG GTGGTG ATGT CC CT CTACGG C 
AVPLKTVLLAAHMVVMSLYG 

28501 GGCCACGAGGACACCCTCACCTACACCGTCACCAACGGCCGCCCCGAC^CCGCCGACGGC 28560 
GHEDTLTYTVTNGRPETADG 

28561 AGCACCGCGATCGGGCTGTTCGTCAACAGCCTCGCGCTCCGCGTCCGGATGACCGGCGGC 28620 
STAIGLFVNSLALRVRMTGG 

28621 ACCTGGGCCGACCTGATCACCGCCACGCT^AGTCCGAGCGCGCCTCGATGCCGTACCGG 28680 
TWADLITATLESERASMPYR 

28681 CGGCTGCCGATGGCCGAACTCAAGCGCCACCAGGGCAACGAACCCCTGGCCGAGACGCT 28740 
RLPMAELKRHQGNEPLAETL 

28741 TTCTTCTTCACCAACTACCACGTCTTCCACGTGCTCGACCGCTGGATCGACCGCGGCOT 28800 
FFFTNYHVFHVLDRWIDRGV 

28801 GGCC^CGTCGCCAACGAGCTCTACGGCGA^ 28860 
GHVANELYGESTFPFCGIFR 

28861 CTGAACCGGGAGACCGGCgAgCTGGAGGTCCGCATCGAGTACGACAGCCTGCAGTTCTCC 28920 
LNRETGELEVRIEYDSLQFS 

28921 GACGCCCTCATGGAGAGCGicCGCGACAGCTACGCCCGCGTCCTCGCGGCCCTC^TCGCC 28980 
DALMESVRDSYARVLAALVA 

28981 GACCCCGAC^GCGCTACGACCGGC^CGAGTTCCG CTCCG ACCGCGACCGGG CCG CACT 29040 
DPDGRYDRHEFRSDRDRAAL 
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29041 GCCGTCCTCACCCGCGGGCCCGAGGCGCCGGCGGCCGACCGGTGCCTGCACGACCTGGTG 29100 
AVLTRGPEAPAADRCLHDLV 

29101 GCGGACCGGGCGGCGGACCGCCCCGACGCCCCGGCCGTCCAGCTGGACACCGACGTGCTC 29160 
ADRAADRPDAPAVQLDTDVL 

29161 AGCTACGGCGAGCTCGACCGCCGCGCCAACCGGCTGGCC^CCACCTGCGTTCGCTCGGC 29220 
SYGELDRRANRLAHHLRSLG 

29221 ATCGGCCCGGAGAGCGTCX3TCGGCGTCCTGGCCGAACGCTCCCTTOCCCAGATCATCGGC 29280 
IGPE SVVGVLAERSLAQIIG 

29281 CTCCTCGCGGTCCTCAAGGCGCK3CGCCGCCTACGTCCCGCTCGACCCGGCCCAGCCCGAC 29340 
LLAVLKAGAAYVPLDPAQPD 

29341 GAGCGCCTCGCCGCCGTCATCGCCGGGAGCGGGGCCGCCGCCGTCCTCCACCGGCCCGGC 29400 
ERLAAVIAGSGAAAVLHRPG 

29401 CTCGAAGGGCGGCTGCCCGCGGGCGTCCGOTCGCTCCCCACCGACGCCGCCGACGGCAGC 29460 
LEGRLPAGVRALPTDAADGS 

2 9461 ACCGCCACG^CGACCCCGGGCCCACCGCCACGCCCCGCAACGCCGCGTACGTGATGTAC 29S20 
TATHDPGPTATPRNAAYVMY 

29521 ACCTCCGGATCCACCGGAGAGCCCAAGGG^TCGTCGTCGAACACCGCAACGTCGTGGCC 29580 
TSGSTGEPKGIVVEHRNVVA 

29581 TCCCTCGCCGCCCGCGGCGCCCACTACGCGGCCGGACCCGGCCGGTTCCTGCTGCTGTCC 29640 
SLAARGAHYAAGPGRFLLLS 

29641 TCCTTCGCCTTCGACAGCTCGGTCGCCGGCATCTTCTGGACGCTCACCCAGGGCGGCACC 29700 
SFAFDSSVAGIFWTLTQGGT 

29701 CTCGTCCTGCCCGGCGAGGGACAGCAACTCGACCCCGCCGCGCTGGTGGAGACCATCGCC 29760 
LVLPGEGQQLDPAALVETIA 

29761 CGGCAACGGCCCACCCACACCCTCGCCATCCCCTCCCTGCTGGCGCCCGTCCTGGAC^ 29820 
RQRPTHTLAIPSLLAPVLDQ 

29821 GCCGCCCCCGGCGACCTCGCCTCCCTGCGCACGGTGATCGCCGCGGGCGAGTCCTGTCCG 29880 
AAPGDLASLRTVIAAGESCP 

29881 GCCGAACTGGCCGCCGCCTGCCGGGACCTGCTGCCCGGGAGCACC^CCACAACGAGTAC 29940 
AELAAACRDLLPGSTFHNEY 

29941 GGCCCCACCGAGACCACCGTGTGGAGCACCGTCTGGTCCCAGGAGAACGAGCACGACGGA 30000 
GPTETTVWSTVWSQENEHDG 

30001 CCCCACCTCCCCATCGGCCGGCCGGTCGCGGGCACCTGGGTGCACCCCCGCGACCACCGC 30060 
PHLPIGRPVAGTWVHPRDHR 

30061 GGACGCACCGTCCCCCTCGGCGTCGCCGGCGAACTCTCCATCGGCGGCGCCGGCGTGGCC 30120 
GRTVPLGVAGELSIGGAGVA 

30121 CGCGGCTACCTCGGGCGCCCCCGGGACACCGC<3GCCGCCTTCCGCCCCGACCCCGAGGCC 30180 
RGYLGRPRDTAAAFRPDPEA 

30181 ACGGCTCCCOTCGGCCGCGCCTACGCCACCGGCGACCTCGGCCGCTACCTCCCCGACGGC 30240 
TAPGGRAYATGDLGRYLPDG 



30300 



30241 AACCTGGAGTTCCTCGGCCGCGCCGACCACCAGGTCAAGATCCGCGGCTTCCGGGTCGAG 
NLEFLGRADHQVKIRGFRVE 

30301 CTCGGCGAGATCGAGGCCGTCCTCGACACCCACCCGGAGCTCCAGCCK3ACCATCGTCATG 30360 
LGEIEAVLDTHPELQRTIVM 

30361 GCACGCGGCGACCACCCCGGCGACCAGGTGCTCGTCGCCTACGTCCTCCCCGCCCCCGGC 30420 
ARGDHPGDQVLVAYVLPAPG 

30421 CGGCGGCCCGAACCCGCCGACATCCAGGGGTACGTCCGCGACCGGCTGCCCCGCTACATG 30480 
RRPEPADIQGYVRDRLPRYM 

30481 GTGCCCACCGCGGTGATCGTCCTCGACGCGGTACCGCTGACCGCCGCCGGCAAGGTCGAC 30540 

O 



WO 00/40704 



T A 



PCT/US00/00445 

VLDAVPLTAAGKVD 



30541 CGGGCCTCGCTCCCCGCC^ 

RASLPAPSHAQLTR.DQbivr.^ 

3060! CCCGGCACCGACACCGAGCGGGCGCTCGCCGCCATC 

30661 CGGATCGGGGCCGGTGACCGCTTCTTCGACGTCGGCGGCGAATCCCTGC^CGCGATGCAG 

RlGAGDRFFDVGGE^^KMinv 

30721 GCCACCGCCGCGGCCAACAAGATGTTCCGCACCCGCGTCTCCGTCCGCCG 

ATAAANKMFRTRV 

30781 GCGCCCTCCCTGCGGGAGTTCGCCCACGAGATCGAC^GGCCCGCCTCGCGGGCGGCGGG 
APSLREFAHEIDKARLAO^u 

30841 ACC<3GCCTCACCGGCCCCGCa3CCGCCCCGGCCACC(^GGTGCCGCCGAATGACCCCGG 
TGLTGPAAAPATGGAAE^ TpA 

309 01 CCGCCGACACCACCC.CCCGCTC^ 

30961 TCGCGCCCGAGGTGCCCGCCTACAACATCTGCACCGCCATCGAGCTCACCGGCACACCGC 
APEVPAYNICTAIELTfal r k 

31021 GCCCGGCGGCGCTGCGGGACGTGGTACGGCGGCTCGGCCGCAGGCACGAGGCGCTGCGCA 
PAALRDVVRRLGRRHbAijK.1 

31081 CGGTGTTCCCGTCGGTGGGGGAGACCCCCCGCCAACGGGTCACCGACCGGGCGGCGCCCC 

VFPSVGETPRQRVTDRAAfij 

3 1141 TGOGGACOG^ 

RTVDLTHLTPAAAEAETAKi 

31201 CGCTACGGTGCGCCGCCGCCC^^ 

LRCAAARPFRLDTGPbAbHi 

31261 CCCTGCTGCGCCGCGCCCCCGGCCACGCGCTGCTCGTCCTC^^ 

LLRRAPGHALLVLSVHHI 

31321 XCGACGGCGGCTCGCTC^ 

31381 TCGCCGGGCGCCCGGACCCOT^ 

AGRPDPLGTPAPGYGRywK 

31441 CGCGGGCGGCGGAACAGGACGAGGCCGGGCGGGAGTTCTGG 

RAAEQDEAGREFWRRELSGA 

31501 CGCCACCCCGCACGACCGTCTTCCGGGGCACCGGCCGGCCC^ 

ppRTTVFRGTGRPAGPPARA 

31561 CCACCGTCCACTACGGCACCGACGATCCGGCCCCGACC^^ 

TVHYGTDDPAPTADFCREHA 

31621 CCGTC AC CGG CTACGTGCTG CTGCTCGCGG CCCTCGC CTG C CTGGTCG C C CGGTAC ACCG 
VTGYVLLLAALACLVARYTG 

31681 G CCGG ACGGACGTGGTG ATCGG CTCACCCGTCGGACTG CGCGAGG ACCCCG AAGGGCT 

31741 CCACCGTCGGCCCGATGCTCAACCTGCTGCCGCTGCGCC^^ 

TVGPMLNLLPLRLRLHGDPt, 

31801 GCTTCGGCGAGGTCCTGGCCCGCACCCGGGAGACGCTGCTCGGCGCGCTGGAG^CCGCA 
FGEVLARTRETLLGALEHKi 

31861 CCACACCGTTCGAGGACATCGTCGACGCGGTGGGCGCCGACCGGGACCCG^CGTCAGCC 
XPFEDlVDAVGADRDPDVbf 

31921 CCCTCTTCCAGATCCTCTTCGCCCACGAACGCCCCCCG^ 
L F Q I ^ 



30600 

30660 

30720 

30780 

30840 

30900 

(orf25) 

30960 

31020 

31080 

31140 

31200 

31260 

31320 

31380 

31440 

31500 

31560 

31620 

31680 

31740 

31800 

31860 

31920 

31980 



WO 00/40704 PCT/US00/00445 

3X98! TCCGTGCCCGCGTCGTACCCGTCCCCGCTCCGGCCGCCAAGTACGAGCTCGCCGTCACCG 32040 
RARVVPVPAPAAKYEbAv i« 

32041 CC AC CG AG ACGCC CG ACGGG CTCCGG CTG ATCGTCG AGG CGG AG C ACGG AC ACGGGGAAC 321 
TETPDGL-RLIVEAEHGHGEF 



3210 



A E 



SYR 



00 



,! cGGCCGAACTCGCCGCCrTCGCCCGC^CTTCGGCGTCCTGCTGG^ 32160 
LAAFARHFGVLLAAGVRA 



32161 CGCCGGACACACCG^^ 32220 

32221 CCGACACCACGGCCCC^^ 32280 

DTTAPRTAPEAP * k r 

32281 ^GGAGTCC^ "340 

32341 TCAGCTACC^GAG^^ 32400 



32401 GCATCGGCACCGAGGACGTGGTCGGCGTC 32460 

3246! CGCTCCTCGCCGTCCTCAAGGCCGGCGC^ 32520 

LLAVLKAGAAYLPVU FAi-i r *\ 

32521 CCGAGCGGGTACGGCTGATGCTCGACGACGCCCGGGCCGCGCTGCTGCTCACCGAGACCG 32580 
ERVRLMLDDARAALLLTETA 

32581 CGCTCGGCACCCCGCCGGCCCCGGCCGGCACC^ 32640 
LGTPPAPAGTPVHHVDGrfr 

32641 CGCCGACCCGGCCCGGGGACGACGCCGACCACAC^ 32700 
PTRPGDDADHTGPDLPT&UA 

32701 CCTACCTCCTCTACACCTCCGGGTCGATCGGCCGGCCCAAGGCCGTGGCCCTC^GCACG 32760 
YLLYTSGSTGRPKAVALQHU 

32761 ACAGCGCCGCGGCGTTCCTGCGCTGGGCGGGCCGCGCCTTCGACGGCGGGGAGCTGGCCG 32820 
SAAAFLRWAGRAFDGGELAA 

32821 CCGTCCTGGCCACCACCTCCGCCGGCTTCGACCTGTCGGTCTTCGAGCTGTTCGCCCCCC 32880 

VLATTSAGFDLSVFELrAfiJ 

32881 TGGCCCACGGCGGCACCGTCGTCCTCGCCGACAGCGCCCTGCACGTGCCCGCCCTGCCCT 32940 
AHGGTVVLADSALHVPALPW 

32941 GGGCGCCCGCGGCGACGCTCCTGAACACCGTGCCCTCCGCGGCCGCCGCCCTGCTGGACG 33000 
APAATLLNTVPSAAAALLDA 

33001 CCGACGGCCTGCCCGACX3GTCTGACGGCCGTCAACCTGGCGGGCGAGCCCCTGACCGCGG 33060 
DGLPDGLTAVNLAGEPLTAE 

33061 AGCTGGTCGCCCGGCTGCATOCCCGCCTGCCXJAAGGCCGCCGTCCGCAACCTCT 33120 
LVARLHARLPKAAVRNLYGP 

33121 CCTCGGAGGCCACCACCTACGCCAC^^ 33180 
SEATTYATAALVPAGGTEAf 

33181 CGGCCATCGGCCGGGCGCTOTGCGCGGCCCGCGTGTGGACCGCCGACGACCGGCAGCGCC 33240 
AIGRALGAARVWTADDRQR^ 

33241 CCCTCCCCGGCGCGGTCGTCGGTGAACTCCTCATCGGCGGTACGGCCCCOTCCCGCGGCT 33300 
LPGAVVGELLIGGTArAK" 

33301 ACCTCGGCCGGCCGGGACCGACCGCCGACGCCCTCCGGCCCGATCCGACGGGACCGCCCG 33360 
LGRPGPTADAFRPDPTGf r u 

33361 GCTCCCGGCTCTACCGCACCGGGGACCTGGCCGTACGCCGCCCCGACGGCCGGTTCGTGT 33420 
SRLYRTGDLAVRRPDGRr vr 

33421 TCCTCGGCCGCAAGGACGAGCAGATCAAACTCCGCGGGGTGCGCATCGAACCGGGCGAGG 33480 
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L G 



RKDEQIKLRGVRIEPGEV 



33481 TGGAAGCCGCTCTCCGCCAGTGCGCGCCGGT 33540 

33S41 CcLlcOAG^CCACCGCCTCGTCGGC^ " 6 °° 
TAENHRLVbr v 

33601 ACCCCGAGCGCACCCTCGCC^ 33660 
33661 CGCTGGTGGTGTGCGACGCCCTGCCGCTGACCGCCAACG 33720 

33780 
33840 



X TCGCCCGGCGGGCGCGC^ 

X GCGTCGAGAAGGCGGTCGCCGCX3ATCTGGCGCGAGGTGCTCG 



33841 TCCACCAGGGGTTCTTCGA^ 33900 

33960 



33901 GGCTGGTCGCGTCCGTCCATCCCGGCCTCCGG^^^ 
33961 TCGCCGCGCTCGCCGCGTCCGTGGACGGGC^ 34020 

34021 ACGCGGCC^ 34080 

081 GCGGACGATGAGCCATGCCGACGCGGGCGACG^GCTC 

MSHADAGDGLDAau 



34 

G R 



^r^rT^^TT^^^"?^^ 34260 
3.,„ «=™«™^^ 34320 

3.321 A^CGTCCTOKC^CGTcicOT^ 3 "" 



34 



345 



G V L G D V 6 L F D 
381 GGCCGAAGTCATGGACCCGCAGCACCGGCTCTCCCTGGAGGAGGCGTCG^CGTCTTCGA 34440 
AEVMDPQHRLCLEEAWHvr 

34441 CACCGCCG^AC^ 34500 

34S01 cLaGcIgTACCXGA^^^ 

; 6 1 CGGCTTCCCGCTGCTGATCC^^ 34620 

34621 ACTGGGCCTCACCGGGCCGAGTTACGCCX3TCGGCTCGGCCTGCTQ3TCCT 34680 
LGLTGPSYAVGSACSSSJjVm 

34681 GGTGCACCTGGCCTGCCAGAGCCTGCTCACCGAGGAATGCGACAT^ 34740 

34741 GGTCTCGCTCCAAGTGCCGCAGGGCCAGGGGTACGTGCACGCCGACGACGGCATCTA 34800 

VSLQVPQGQGYVHAu^v 

34801 ACCCGACGGGCGCTGCGCCCCCTTCGACGCCGGCGCGGCGGGCACGGTGGGCG^CAGCGG 34860 

PDGRCAPFDAGAAL,! 
34861 CGTGGGCCTCGTCCTGCTCAAGCGGCTCGCCGACGCCGTGCGCGACGGGGACCGCGTCC^ 34920 

V G L V LL KR L A D i\ , v n. 

11 
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34921 cgcggtgatcctc^ 
349 si gccccgcgtc^c^^ 

350 41 CGCCGCGACCGTCGGCGTCCTGGAGGCG^CGGCACCGGCACCCGGCTG^CGATCCCGT 
351 0X CGAAGTGGCCGCGCTCACCCGG^ 35X60 

3 5221 GATCAAGGCCGTGCTGGCGGTCCGCGAGGGCCT 

35281 GCCC^CCCCGCCATCGACTTOSCCA^^ 

PNPAlDFATTPFivi/i 

353 41 CTGGCCGGAGGCGGACCACCCCCGCCGGGCCGGCGTCAGCTCC^CGGCATCGGGGGCAC 
3540! CAACGCCCACGTGATCCTGGAACAGGCCCCGC^ 

35461 CGGGGTGCCI^TGCCGTTGGTGGTGTCCGCCCGCACCCGCGAAGCACTGGCGGAGGCCGT 35520 



. . P M P L V V S A R T 
35521 CCGGGACCTGGCGGCGTGGTCGGCCCCGGAGCCGGGGACCCGGCTCGC 

35581 CACGCTGGCCGGGCGCCGGGC^^ 
35641 CGAGGCCGCGCGCCTGCTGGGCGGCGCGCGCGGM 
3 5701 CGTGTTCCTmCCCCGGG^GGGCACCCTCCCGCCGGACACC^ 



35040 
35100 



35220 
35280 
35340 
35400 
35460 



35761 GGACGTGCCGGCGTTCCGCGCCCACTTCGACGCCTGTGC^AAGGG^CGCCCCGCTCGG 
DVPAFRAHFDACAEGFAfK" 

35821 CACCOA^ 

35881 CCTCTI^^ 

35941 CGCGATGCTCGGGCACAGCCTC^ 

36001 CCTGCCGGAC^ 
36061 CGGCCGC*TG^^^ 

36121 GGTGG AGTTCAG CGC CTT CAACG CCCCCGGCCG CTGCGTCGTCGG CGGGCCCCCGG AG CC 
VEFSAFNAPGRCVVGGP Pbf 

36181 GGTGG CGGAGCTGCGCGCCCGGCTGGCGCGGCGCGGAGTGCCGGCCGCCGAACTGG 

VAELRARLARRGVPAAELAi 

36241 CGCGC*CGCCTTCCA^ 

AHAFHSAAVEPLLDGFROVi, 

36301 GGAAGGCGTCCX3ACTGCGGCCGCCCCG 

EGVRLRPPRLRYVSSbi^u 

36361 GGCCGACGCCGCGGTCACCACCCCCGCGTAClgCTCGCC 



35580 

35640 

35700 

35760 

35820 

35880 

35940 

36000 

36060 

36120 

36180 

36240 

36300 

36360 

36420 



36421 

36481 G< 

36541 

36601 

36661 

36721 CG 
E 

36781 TO 
3684 
3690 
3696 
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A D A A V T T P A V W L A H L R R P V 
CTTCOC^ 

ACTCTGGC^ 
CCGC^ 

Igtccgaacccacg^ 
_ . xgLLcgc^^ 

ppLAVDQRP GijKXVJ 
,1 CGCCCTGG^^^ 

,1 cL«CT^ 

„a ggggIgatcgccgcggagatc^ 

GTlAAElTAAHPSFb^ 

3,021 GCTCCGGCACTGCGCCCAG^^ 

LRHCAQGYFKAu^ 

37081 CGTCCTCTATCCGGCCGGCAGCGGCGACCTCCTGCGCCGCA 
CGAC^ 

GGCCGACCGCGAACCCGGCCGCCCGCTGCGCGTCCTGGAGGCCGGAGCG^CG^CGG^CAG 

ADREPGRPLR VLEAUA 

cctcaccca^^ 

CTCCCGGCACrTCGTGACCGC^CTCGGCCGGGAGGCCGCCCGGC^ 37380 
SRHFVTALGRbA^^^^ 

CCGCGC^CGCGTCCTCGACATCGCCCGCGACCCA^ 
RARVLDIARU^ 

37441 GTTCGACGTCCTCTGCGGCCTCGAGGTGGTCCACGC 

FDVVCGLDVVMHir 

CGGCCATCTGCGCTCCCTGATG^^ 

CGACGACCCCTCGCTGACGATGATCTGGOTCCTGACGGACGGCTGG 

CCGGCGCACCCACGGCCCGCTGC^^ 
RRTHGPLLDAAGWRAi.1. 

GCTCGCCCGGCAGACC^ 



37141 
37201 
37261 
37321 



37381 CO 
R 



36480 

36540 

36600 

36660 

36720 

36780 

36840 

36900 

36960 

37020 

37080 

37140 

37200 

37260 

37320 



37S01 
37561 
37621 
37681 
37741 
37801 



37440 
37500 
37560 
37620 
37680 
37740 
37800 
37860 



WO 00/40704 



PCT/USOO/00445 



37861 GACGGGCGGCTGCCTGCTGCTGGGCGACGGGGAC^ 



T G G 



L G D G D T 



37921 



GGAGGCCCTCGGCGTGCCCGTCACCACCGTCGGCGGCGGCCGACCGCCGGGCCCCGAGCG 
EALGVPVTTVGGGRPPGPER 



37981 GTACCGGGAACTCGT^ 



38041 



CGCGTCCCA^^ 



S H 



G R A A 



A A 



38101 GCTGCACAACCTGCTCCACCTCGCCCGGGCCTTCGGCGCGCTGGAGGAGCGCCACCCCGC 
LHNLLHLARAFGALEERHPA 

38161 CCGCGTCGTGACCGTGACCACCGGTGCCCACGACGTGCTCGGCGACGACCTCGCCCACCC 
RVVTVTTGAHDVLGDDLAHP 

38221 CGAGCACGC^CCGTCCCGGCCGCGGCCAAGGTGATCCCCCGGGAGTACCCGTGGATCGC 
EHATVPAAA KVI PREY P W 1 A 

3 8281 CTGCACCGCCCTGGACGTGGAGCCGGGCCTGGACGCCGAGCGGCTGGCGGACCTGATCGT 
CTALDVEPGLDAERLADI.1V 

38341 CCGGGAACTCGGCGCGGCGCGCGAGACCACCGTCACCGCCTGCC^CGGCCGACGCCGCTT 
RELGAARETTVTACRGRRRt 

38401 CACCCCCTGCCCCGTCCGG^GCCCCTCCCCGCCGCACCGGAACGCCCGGCGGTCCGGCC 

TPCPVRQPLP AAPERH 

38461 CGGCGGCGTCTACCTCGTCTGCGGCGGCCTCGGCGGCATCGGCCTCCACCTCGCCGAGTA 
GGVYLVCGGLGGIGLHLAfci 

38521 CCTGGGCCGCGCCCGCACCACCGTCGTCCTCACCCACCGGCGGCCCTTTCCCGCCCCCG^ 
LGRARTTVVLTHRRPFPAPfc 

38581 CGCGTGGGACGGGCTGCCCGCGGGACACCCGGAGGCGGCCGTCGTCCGGCGGCTGCGCTC 
AWDGLPAGHPEAAVVRRLKb 

3864 1 CCTCGCCGCCACCGGCGCCACGGTCGTCGTCCGCCGGGCCGACCTCACCGACCAC^ 
LAATGATVVVRRADLTDHDA 

38701 GATGCGCGCCCTCGCGGACGAGGTGGAACAGGCCCACGGCCCCGTCCGGGGGGTGGTGCA 
MRALADEVEQAHGPVRIjV 

38761 CGCGGCCGGGGTGCCCGACACCGCCGGCATGATCCAGCGTCGCGACCGAGCCG^CACGGA 
AAGVPDTAGMIQRRORAt> i i> 

CX3CCGCCCTCGCCGCCAAACTGACCGGCACCCTCGTCCTGGACGAGGTGTTCGCCCACCG 
AALAAKLTGTLVLDEVFAHR 

38881 CGACCTCGACTTCCTCGTCCTGTGCTCCTOTATCGGCACCGTGCTGCACAAGCTGAAGTT 
DLDFLVLCSSIGTVLHKLKF 

38941 CGGCGAGGTCGGCTACGTGGCGGGCAACGAGTTCCTCGACGCCTATGCCGCCCACCGCGC 
GEVGYVAGNEFLDAYAAHRA 

39001 GGCCCGCCGCCCC^GCAGAACCCTGTCGATCGCCTGGACCGACTGGCGGGAGTCGGGCAT 
ARRPGRTLSIAWTDWRESl>n 

39061 GTGGGCCGCCGCCCAGCGCOTTCTGACCGAGTO 

39121 ACCGCCCGGGGGCGACCTGCTOMCGCGATCAGCCCCGAGGAGGGCGrc 



38821 



37920 



37980 



38040 



38100 



38160 



38220 



38280 



38340 



38400 



38460 



38520 



38580 



38640 



38700 



38760 



38820 



38880 



38940 



39000 



39060 



39120 



39180 



L L G A 



p E E G V 



39181 CCGGCTGCTCGCCGCCGACACCGGCCCGAACGTCATCGTGTCGGCCCAGGACCTCGACGA 
RLLAADTGPNVIVSAQDLDE 

39241 ACTCCTCGCGCGGCACGCG^ 

LLARHAAYTTDDHLAALGUi, 

39301 GAGGATCGCCGCCGCCCGGGACC^^ 

DRSAPA APYAArn 



39240 



39300 



39360 



R I A A A R 



39480 
39540 
39600 
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39361 GCCCGCCCAGCGGCGG^ 39420 

39421 CCTCGACGACGACTTCTTCGCGCTCGGCGGGGACTCGCTGCTCGCCCTGTOCCTGCTC 
LDDDFFALGGD^l.^^ 

3948! GCAGCTGCGGGACGCCTACGGGGTGGAGATCTC^ 
395 41 GGTGGCGGCGCTGGCCGCCGCCACCGGCCCGC^ 

39601 GGTGGTGCTGTGACCACGCCCCGCATCACCGACCTGCTCACCGAGCTCCGCGGCCGGCAG 39660 

V V L „' T T P R I T D L L T E L R 0 R Q (or£23) 

39661 GTGACCCTCACGGCCGACGGGGACCH5CTGCACTC 

VTLTADGDKbnt-^ 

39721 GACGAGCTCCTCGCCACCATCCGCGCCCGCCGCGACG 
39781 GACCGCCGCATCCCGCGCCACGACGGGC^ 
39841 TGGCTCCTCCACCAGTTCCAC^ 

39901 CTGCGCGGGCCCCTGAACCCGGCCGCCCTGCGCGCCGCCCTGGCCGAGGTGGTACGGCGG 
LRGPLNPAALRAALAHV 

39961 CACGACGTCCTGCGCACCCX^TACG^ 

HDVLRTRYAlSRGLfKfv 

40021 CCGGCCCACACGC^ 



P A 



40081 GACGCCGAACTCGCCCGGCTGGCCGCCCAGGAGGCCAG^ 

40141 GGCCCGGTGCTGCGGGCCCGGCTCCrCCGAACGGCCCCCGAGGAGCACCGGCTGCTGCTG 
GPVLRARLLRTAPEEHRbi-^ 

40201 ACCCGCCATCACATCGCCAGCGACGGCTGGTCGCTCGACATCCTGCTCCGCGAACTGGGC 
TRHHIASDGWSLDILLRbiiU 

40261 ACGTTCTACCGGGCAGGGCGGGACGGCACACCCGCCGGCCTCGACGCCCTGCCGCTGCGG 
TFYRAGRDGTPAGLDAL.fijR 

40321 TAMCCGACTTCOCCSa 

40381 TCGACCCGCTGGGCACGGCACCTGAGGGGCGCCCCCGCGACACTCGACGTCCTCGGGCCC 
STRWARHLRGAPATLDVLGP 

40441 CCGCCCGCCGAACCCTCCCACGCGCCGGCCGG^^^ 

40501 CTCGTCACCGGCCTGCGGCAGCTGGGCGGCCGGGCCCGCACCACGCTCTTCCCGCTCCTG 
LVTGLRQLGGRARTTLf tr u u 

40561 CTGAGCGCC^CGGCCTCGCCCTGGCCGGCCCGCCCG^ 

40621 ATCCCCGTCGCCGGCCGGCCGCGCACCGAACTGGAGCCGCTCATCGGCTGCTTCGCGACC 
IPVAGRPRTELEPLIGCFAl 

40681 ATCGCGCCGATGCGGCTGACGAGCGACGGGACCGAGCCGCTC 

IAPMRLTSDGTEPLTRbAAK 

40741 GCCCAGCAG^CGTCCAGGACGCGCTGGACGGACCCGAC^ 



39720 
39780 
39840 
39900 
39960 
40020 
40080 
40140 
40200 
40260 
40320 
40380 
40440 
40500 
40560 
40620 
40680 
40740 
40800 
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40801 CACGCGCTGCGTCCGGAGCGGGACCTCGCGGAGAACCCCCTGTTCTCGGCGTCGTTCGCC 40860 
HALRPERDLAENPLFSASFA 

40861 TTCCAGAACACCCCGCGGACMCCGTGCGCCTC^ 40920 
FQNTPRTAVRLPGLDAEVbl' 



40921 TCGCCGCCCGTGGCCCCCAAGTTCCCGCTGGCCCTCACCGCGACGGCGCGGGCCGACGGC 40980 

S ppVAPKFPLALTATAKAU^ 

40981 GGAATGGGCCTGGAGCTGGAGTTCC^CCGGGACCGGM 41040 
GMGLELEFDRDRIAEPVAKO 

41041 ATCCTCACGTCCTTCCACGCCGCCCT^ 41100 
ILTSFHAALARAVADPEAPA 

41101 GCGCCCGTACCGGCCGCCGCCGTGGACCGGCGGCCCGGGCGCGAAGGACACGAGTGCCTC 41160 
APVPAAAVDRRPGREGHECb 

41161 CACGAGCCGGTGGCGCGGGCGGCGGCACGCCACCCCGACGCCGTCGCCGTCAGCTGCGGC 41220 
HEPVARAAARHPDAVAVSCG 

41221 GGCACCCAGCTCAGCTACGGGGCGCTCGACACCTO 41280 

41281 CGCGCCCACGGCGCCGGCCCCGAGCGGCTGG^ 41340 
RAHGAGPERLVALCLPTGPE 

41341 TGGGTCGTCGGCGCCCTCGCCATCCTCAAGTCCGGCGCCGCCTACCTGCCGCTCGACCCC 41400 
WVVGALAILKSGAAYLPLDP 

41401 GGCGACCCGGCCGAGCGCCGCGCCTCCGTCGCCGCCGACGCGGGAGCGACGCTGATCGTC 41460 
GD PAERRASVAADAGA TL 1 V 

41461 TCCGACACCGCGCT^ 41 "0 

41521 GCCCCCGAGCCCACCGCCCGGGCCGTCCTGCCCGGCAACCTCGCCTACGCCGTCTACACC 41580 
n dpdTARAVLPGNLAYAVYT 



41581 TCCGGCTCCACCGGCGGCCCCAAGGGCGTGCTCGTCACCCATGCCAACOT 41640 

41641 CTGGCCGCGTGCCGTGAGGCCCTGCCCGCCCTGGACGCCCCCCGGACCTGGTCGGCGACC 41700 
T. AACREALPALDAPRTWSA1 



41701 CACTCGCCGGCCTTCGACTicTCCGTCTGGGAGGTCTGGGGCCCGCTGACCGCCGGCGGA 41760 
HSPAFDFSVWEVWGPLTAGO 



41761 CGCCTCGTCCTCGTGCCCCCGGACGTGGCCCGGGCCCCGGACGAACTGTGGGACACCCTC 41820 
RLVLVPPDVARAPDELWDTL 

41821 CGCGACGAACAGGTCGAAGTCCTCAGCCAGACCCCCAGCGCGTTCCACCACCTCCTGCCC 41880 
RDEQVEVLSQTPSAFHHLLP 

41881 ACCGCCGTGCGCCGGGCGGCCCAGGCCACCGCGCTCGAACTCGTCGTCCTCGGCGGCGAG 41940 
TAVRRAAQATALELVVLGGE 

41941 GCGTGCGAGCCCGCCCGTCTGACGCCTTGGTGGGACGCCCTGGGCGACCGGCGCCCGGCC 42000 
ACEPARLTPWWDALGDRRPA 

42001 GTGGTCAACATGTACGGCATCACCGAGAACACCATCCACGTCACCGTCCGCCGGATGA 42060 
VVNMYGITENTIHVTVRRMT 

42061 GCGGCGGACCGGTCGGGCAGTCCCGTCGGCCGGCCGCTGCCGGGGCAGOTCGCCGACCTT 
AADRSGSPVGRPLPGQRADL 

42121 CTCGACCCCCACGGCCGGCCCGTCGCGCCGGGCGGGCGGGGCGAACTGTTCGTCGGCGGC 
LDPHGRPVAPGGRGELFVGG 

42181 GTCGGACTGGCCCGCGGCTACCTCGGCCGGCCCGGCCTCACCGCCCGGAGCTTCCTGCCG 42240 
VGLARGYLGRPGLTARSFiir 

42241 GACGACACCCCCGGCTGGCCGGGCGCGCGCCGCTACCGCTCCGGAGACCTGGCCCGGCTG 42300 
DDTPGWPGARRYRSGDLAKb 



42120 
42180 
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„ s „ ^444^^^^^^?^ 43560 

43561 44444?Trrrs^ 13620 

„ m 4444444^^™^;^??" 43680 

4S681 ^^r^^^TTT^^w 41,40 
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43741 CCGG ATCG AC CCGC CGCCC CCG CGGCGTCATGGTCGG CC ACGGCAATCTG CTGG CC AACG 43800 

gstrrprgvmvghgnllane 

4 3 801 AGCGCTGCATCGCCGCCGCCTGCGGCCACGACCGGGACTCCACCTTCGTGGGATGGGCGC 43 860 

rciaaacghdrdstfvgwap 

4 3861 CGTTCTTCCACGACATGGGCCTGGTCGCCAACCTCCTCCAGCCCCTCTACCTCGGGTCCC 43920 
FFHDMGLVANLLQPLYLGSL 

43921 TGTCGGTGCTGATGCCGCCGATGGCCTTCCTCCAGCGCCCGGCCCGCTGGCTGCGGGCCG 43980 

svlmppmaflqrparwlrav 

43981 TCTCCCGCTACCGGGCGCACACCAGCGGCGGCCCCAACTTCGCCTACGACCTGTGTGTCG 44040 

sryrahtsggpnfaydlcvd 

44041 ACCGGGTCGGCGAGGACGAGCGGGCCGGACTGGACCTGTCGGGCTGGAAGGTCGCCTACA 44100 
RVGEDERAGLDLSGWKVAYN 

44101 ACGGCGCGGAACCTGTACGGGCCGACACCCTGCGACGGTTCACCGACCGCTTCGCCCCCC 44160 
GAEPVRADTLRRFTDRFAPH 

44161 ACGG CTTCAC CCC CGG CG CG C ACTTCC CG ACCTACGGG CTCGCCG AGGCGACCCTG CTCG 44220 
GFTPGAHFPTYGLAEATLLV 

44221 TCGCCACCGGCCCCAAGGGAGTGCCGCCCCGCACCCTGACCGCCGACCGCGCCGCCCTGC 44280 
ATGPKGVPPRTLTADRAALR 

44281 GCGCCGGCCGGCTCCGGCCCGCCGGGCCCGGCGAGGCCGGCCTGGAACTGGTCGGCAACG 44340 
AGRLRPAGPGEAGLELVGNG 

44341 GCACCGCCGGCCTCGACACCACCCTCCGGATCGTCGACCCCGCGACCGCGCGGGAGTGCC 44400 
TAGLDTTLRIVDPATARECP 

44401 CGCCCGGAGAGGTCGGCGAGGTCTGGGTGCGCGGCCCGGGCGTGGCACGCGGCTACTTCG 44460 
PGEVGEVWVRGPGVARGYFG 

44461 GCCGCCCGCGCGAGTCCGCGCCGCTGCTCGCCGCCCGCCTGCCCGGCGGCGAAGGACCGT 44520 
RPRESAPLLAARLPGGEGPY 

44521 ACCTG CGG ACCGGGGAC CTGGG CGC CCTG CACG ACGGGG AACT CTTC CT C ACCGG ACG C C 44580 
LRTGDLGALHDGELFLTGRH 

44581 ACAAGGACCTCATCGTCATCCGCGGCCAGAACCACCACCCGCACGACCTCGAACGGACCG 44640 
KDLIVIRGQNHHPHDLERTA 

44641 CCGAGCAGGCCCACCCGGCGCTCCGCCCGACCTGCGCCGCCGCGTTCGCGGTGCCCGGGG 44700 
EQAHPALRPTCAAAFAVPGD 

44701 ACGGCGCGGAGCGGCTCGTGCTCGTCTGCGAACTCACCTCCTACCGCGCCGTCGACCCGG 44760 
GAERLVLVCELTSYRAVDPA 

44761 CCGCCGTCGCCGAGGCCGTCCGGGCCGCGCTCGCCGCGCGGCACGGCGTCGCCCCGCACA 44820 
AVAEAVRAALAARHGVAPHT 

44 821 CGCTGGTGGTGCTGCGCCGCGGCGGCATCCCCAAGACCACCAGCGGAAAGGTGCGGCGCG 44880 
LVVLRRGGIPKTTSGKVRRG 

44881 GC CACTGCCGG ACGGCCT ACCTCG ACGGAACG CTCC C CGTTCACACGG CCGTCCGCCT C C 4 4940 
HCRTAYLDGTLPVHTAVRLP 

44941 CGGCGGGGGAGGAGGGCACCGACMCCCTTCCCCTGACC^CGGACCCCGGTCGGCTGGCCA 45000 
AGEEGTEALPLTTDPGRLAT 

45001 CCK3CGCTG(XCGACCTGGCCGCCGCCCACGCGGGCCTGGCCGGGCCCCTCCCCGGCACCG 45060 
ALRDLAAAHAGLAGPLPGTD 

45061 ACG AGC CGGTG AG CGCC CT CGGC CTGG ACT CG CT CG C CTCCCTGCGG CTCC AC CAC CACG 45120 
EPVSALGLDSLASLRLHHHV 

' 45121 TCCAGTCCGCC^ACGGCGTGACCCTGCCCGTCACCGCCCTGCTCGGCGACACCACTTACC 45180 
QSAYGVTLPVTALLGDTTYR 

45181 GCCGGCTCGCGGAGCTGACGCTCGCCGCCCCCCGCCCGGCCCGGGCGCCCGAGGGGCAAG 4524 0 
RLAELTLAAPRPARAPEGQV 
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45241 TCACCGGCGTCTGGCGGCCGTTGACGCACGGGCAGCGCGCCCTGTGGTACGAACAGGCG^ 45300 
TGVWRPLTHGQRALWYEQAb 

45301 TCGCCCCGCACGCGGCCGCCTACCACCTCGTCCGCGCGCTGGCCCTCCGCGGCC^ 45360 
APHAAAYHLVRALALRGPVD 



45361 ACGAGGAGGCCCTCGCCGAGGCGGTCCGCCGCGTCGTCCGC 



A L 



4 



45420 



46681 ACCACATCAGCCACGGAGCCCTGCACCGCGCGGCCACCACCCTCGCCGCCCGGCTCCGCC 



rrrGCTTCGCGCTCCGCGACGGCGAACCGGCGCGCCGGACCGAGCCGTACGGACCGGAGC 4 54 80 
RFALRDGEPARRTEPYGPEL 

5481 TGGACGTACGCGACGCCACCGGCCTGCCGGCGGACCGGCTCCGCGAACACCTGGCCG^ 45540 
DVRDATGLPADRLREHLAAA 

45541 CGGGCGACCGCCCCTTCGACCTGGCCGCCGGCGACAGGCCC^ 45600 
GDRPFDLAAGDRPVRLTLYR 

45601 GCACGGACGGCGGCCACATCCTGCTGCTGGTCG 45660 
TDGGH I LLLVAHHLVADFWS 

45661 CCCTCGTCGTCCTCCTGGGCGACCTCGCCCGGGCCCACGCGGGCGAGGACCTGCCGCCCG 45720 
LVVLLG DLARAHAGEDLPPA 

45721 CG CCGG AGGGGG ACCC CGG CG ACG AGGCG ACGGACG CGG AC CGG ACGTACTGG CGGC ACC 45780 
PEGDPGDEATDADRTYWRHR 

45781 GGCTCGCCGACGCGCCACCCGCCCTCGACCTGCCCACCGACCTCCCC^ 45840 
LADAPPALDLPTDLPHPAER 

45841 GCGGCTTCGCCGGCGCCACCCACGCCTTCCGGCTGCCCCCGGACCTCACCGCCCGGCTGA 45900 
GFAGATHAFRLPPDLTARLT 

45901 CCGCCCTCTCCCGGGAACGG^ 45960 
ALSRERHCTLFTTLLAAHQL 

45961 T ACTG CTC C AC CGCCTG ACCGGG C AGG ACGACCTCGTCGTGGG CACC CTCCTCG C CCGCC 46020 
LLHRLTGQDDLVVGTLLARR 

46021 GCGACACCGCCGAAGCGGCCGGCGCCGTCGGCTACCTGGTCAACCCGCTGCCGCTGC^ 46080 
DTAEAAGAVGYLVNPLPLRS 

46081 CCGTACGGGAGCCGGGGGAGACCTTCACGGAACTGCTGCGCCGCACCCGGCGGACCGTGC 46140 
VREPGETFTELLRRTRRlVi, 

46141 TGGACGCC^TCGCGCACGGCCGCCACCCCTTCGGGCCGCTCGTCTCCCG^ 46200 
DAVAHGRHPFGPLVSRLAPA 

46201 CGCGCACGCCCGGCCGCGCGCCGCTCCTGCAGAGCCTGTTCGTGCrCCAGCGCGAGTACG 46260 
RTPGRAPLLQSLFVLQREYO 

46261 GCGACGAGGCGGAOJGGTACCGCGCGCTCGCCCTGGGCGTCGGCGGCCGGCTGCGCGTCG 46320 
DEADGYRALALGVGGRLRVG 

46321 GCGGACTCGACCTGGAGGCACTCGCGTTGCCGCGCCGCTGGTCGCAGCTCGACCTCTCGC 
GLDLEALALPRRWSQLDLSb 

46381 TGAGCATGGO^^ 

SMARLGDGLTGVWEYRTDLt 

46441 TCACCGAGGCCACGGTCGCGGAGCTGAGCGAGGCGTTCGTCCACCTGCTGCGGGCGGCCG 
TEATVAELSEAFVHLLRAAV 

46501 TCGAGGACCCGGGCGCGCCCGTGGAGACGCTGCCGCTCACCGGCGGCCGGGAGACCGGGC 
EDPGAPVETLPLTGGRETGP 

46561 CGCGCCGCGGCCCGTCGGCGGCCCGGCCCGCCCTCCCGCTGCACCGGCTCGTGGCCGCGG 
RRGPSAARPALPLHRLVAAA 

46621 CGGCGCG C CG CG AT CCCG C ACGG ACGG CGGTCGTCG CACT CGC CC CGG ACGG CACCGCC C 
ARRDPARTAVVALAPDGTAH 



46380 



46440 



46500 



46560 



46620 



46680 



46740 
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H I 



L H R A A 



L A 



L R R 



4674 1 GGGAGGGCGCCGGCCCGGAGCGGCCCGTCGCCGTGCTCGTCGAGCGGGGCCCCTGGCTGC 

EGAGPERPVAvu 

4680 1 CCGTCGCC^ 

4686 1 A CCCCCCGCACAGGCTC^^ 

ppHRLARi irtwj 

46921 AGACCGGGACCGCCTCGCGCGCGGCCGAGGCGGCCGGTCCCGGCGTACGCG^ 
TGTASRAAbA^^^ 

4698 1 TGCGTGAGGGTGCCACCGGCGGCGAG^ 

47041 CGTACCTGCTGTACACCTCCG^GTCGACG 

47101 GGGCCATCGTCAACCGCCTCCTGTGGATGCAGGAG 

AlVNRLLWMQ^i x n *j «■ * 

4,161 GGGTCCTGCACAAGACGCCGGTG 

4722 1 TGACCGCCGGGGCGACCGTCGT^ ' 

4728 1 TCGTCC^GCGGATCGCCCGCGAGGC^ 

4734 1 CCCCGTTCCTCACCGAGCTCGCC^^ 

474 oi tgLcaccggggaag^gctgcccg^ 
4746 i cccggctgta^^ 

4752 1 GCCGCCCGCCCGAGCCGGGGCCGGTGCCGATCGGCCTO 

475 81 AGGTCCTCGACGGCCGGCTGCG^^ 

VLDGRLRPLt' K ^ v w 

47641 GCG^CGCCTGCCTGGCCCATG^^ 

4770 1 TTCCGGCCCCCGGCGGCGGGCGCCGCTACTOCACCGGGGACCTCGTCCGCC 

PAPGGGRHxki \s is 
4776 1 ACGGGGCACTGGTGTTCCGGGGACGCACGGACGACCAGGTGAAGATCGGCGGCATCCGGG 

galvfrgrtddqvki^^j- 

4782 1 TCGAGCCCGGCGAGGTGGCGGAGGCGCTTCGGGCCCTGCCCGGCGTCGCCGACGCCGCGG 

epgevaealralfuv^^ 

47881 TCGTCCCGCACGACGGGCGGCTGGCGGCGTACGCGGTCGCC^ACCCGGTCGGCCCGGCCC 

VPHDGRLAAYAVAUf 

479 41 CX3GCGGCGGACGCCCTGCGGGACGCGCTGCGCAGGCGGCTGCCCGGCCACCTGGTGCCCG 
AADALRDALRRRLPtj"^ 

48001 CCGCCCTCACCCTGCTGG^ 

480 61 CGCTGCCCCACCCGTCGGCCCCGCCCCC^ 

LPHPSAPPPDGGKrr 

4B121 AACGGCTCGTCGCCCGGGTGTGGGCCGAACGCCTCGGACGGGAAGTCGTCGGCGTGGACC 

RLVARVWAERLGRt-v v«v 

• ^> • 



46800 

46860 

46920 

46980 

47040 

47100 

47160 

47220 

47280 

47340 

47400 

47460 

47520 

47580 

47640 

47700 

47760 

47820 

47880 

47940 

48000 

48060 

48120 

48180 



P D G I R Q 

49081 GC 



'CTGTCGGCGCTCG CCGGCG AG CTGGGTGTCGGGCT CAAACACG TTCTGCTCGGCGTC C 
LSALAGELGVGLKHVLLGVH 
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48181 GGGACTTCTTCTCCCTGGGCGGCGACTCCGTCCGGGCCCTCGGCGTGACGGCGGCCCTGC 48240 
DFFSLGGDSVRALGVTAALR 

4 8241 GCGCCGCCGGGCTCCCGGTGACGGTCACCGACCTCCTGCGCCTGCCCACCGTGGCCGCCC 48300 
4 A A GLPVTVTDLLRLPTVAAL 

4 83 01 TCGCCCGCCACGCCGACGAGCGGGCGGATCGCCGACCGGCGCGACAGGAGACGCCCCCCG 4 8360 
ARHADERADRRPARQETPPG 

4 8 3 61 GGCCGTTCGCCCTCTGCCCGGAAGCCGCCGGCGTGCCCGGCCTGGAGGACGCCTACCCGA 48420 
PFALCPEAAGVPGLEDAYPM 

48421 TGTCGATGGCCCAGCGGGCCGTGCTCTTCCACCGTGACCACAACCCCGGCTACGAGGTCT 4 8480 
SMAQRAVLFHRDHNPGYEVY 

48481 ACGTCAC C AG CGT CG CCGTCTCCACGCCCCTGGACCG CACACGGCTCG CCGCGG CCGTGG 48540 
VTSVAVSTPLDRTRLAAAVD 

48541 ACCGGCTGCTGGACCGGCACGCCTATCTGCGGTCCTCCTTCGACCTCGTGTCCCACCCGG 48600 
RLLDRHAYLRSSFDLVSHPE 

4 8601 AGCCCACCCAGCTCGTCTGGACCCACCTGCCCACCCCGCTCGAGGTGGTGGAGTCGTCCG 4 8660 
PTQLVWTHLPTPLEVVESSD 

4 8661 ACCCCGCCGGTTTCGACGCGTGGCTGCACGCCGAACGCAAGCGCCCCCTCGACGTCGGCA 4 8720 
PAGFDAWLHAERKRPLDVGT 

48721 CCGGACCGCTGGCCCGGTTCACCGCGCACGACGCGGGAGCCGCCGGATTCCGGCTGACCG 48780 
GPLARFTAHDAGAAGFRLTV 

48781 TCAGCAGCTTCGCCCTCGACGGCTGGTGCGTGGCCACCGTGCTCACCGAACTGCTCCGCG 48840 
SSFALDGWCVATVLTELLRD 

4 8841 ACTACTGGTCCGCGCTGCGCGGCGCGCCCCTCAGCCTCCCGGCACCCGCCGCCTCCTACC 48900 
YWSALRGAPLSLPAPAASYR 

4 8 901 G CG AGTT CGT CGC C CTCG AACG CG CCG C C CAAC ACG ATCCGG CG C ACCGGGAGTTCTGG C 4 8960 
EFVALERAAQHDPAHREFWR 

4 8961 GGACGGAGCTCGCCGGTGCCCGGCCGCATCCGCTGCCCCGCCGCCCGGTGCCACCGCCCG 49020 
TELAGARPHPLPRRPVPPPG 

49021 GGCCGGACGGGATCCGCCAGCACCGTCACGTCGTCCCCGTCGAGGACACCGTCGCCAAGG 49080 
PDGIRQHRHVVPVEDTVAKG 



49140 



4 9141 ACCTGCGGGTCGTCCGGGCCCTGTCCGGCGACCCCGACGTCATCACGGCCGTGGAGACCC 49200 
T,trT7t. a t. QfinPDV I TAVETH 



VVRALSGDPDV- ITAVETH 

4 9201 ACGGC CG CCTCG AACGG CACG ACGGCG ACCGCGTCCTCGGGGTGTTCAACAAC ATCCTG C 4 9260 
GRLERHDGDRVLGVFNNILP 

492 61 CGCTGCGG CAGCGGGTGGACGG CGGG AGCTGGG CCG AC CTGG C CCGCGCCGCGC ACG CCG 49320 
LRQRVDGGSWADLARAAHAA 

49321 CGGAGGCGC(^ACGGGGGAGTACCGCCGCTATCCGCTGGCCCAGGCACAGCGCGACCACG 49380 
EARTGEYRRYPLAQAQRDHG 

4 9381 GCGCGGCCG^CTCTTCGACACCCTCTTCGTGTTCACCCACTTCCACCTCTACCGCGCGC 49440 
AAGLFDTLFVFTHFHLYRAL 

4 9441 TGGCCGACCTGGACGGCATGGCGGTCTCCGACCTGCGGGCCCCCGACCAGACCTACGTAC 49500 
ADLDGMAVSDLRAPDQTYVP 

4 9501 CGCTCAC CGCCC ACTTC AACGTCG ACGCC ACGG ACGGCGG CGG C CTG CGGCTGCTG CTGG 4 9560 
LTAHFNVDATDGGGLRL LLE 

4 9561 AGTCGGACCCGCGGGAGTTCCCCGACGAGCAGGTCGCGGAGTTCGCCGCGTACTACCGCC 4 9620 
SDPREFPDEQVAEFAAYYRR 

4 9621 GCGCGCTGCGGGCCGCCG CCG ACG CCCCGCACCGGCCGTACCGGGACACGCCGTTG ACGG 4 9680 
ALRAAADAPH RPYRDTPLTD 
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49681 ACCGGCCGGCCGGTCCGGCGCCGCACCGCGCGGAG^CTCCGTC^CGCCCTGTTCGCGG 49740 
R pAGPAPHRAERSVHAl,r«« 

4974 1 CCCCGGCCCGGAACCACCCGGACCGGATCGCGCTCGACGGCGAGGACGGGCCGGTCAGCC 

PARNHPDRlALDGEDL.ir'von 

49801 ACGGCGCCCTGGCCCGGCGCGCCGCCCGCCTCGCCGGAACGCTGCGGGCCGCGGGCGCCG 
GALARRAARLAGTLRAAt,AO 

4986 1 GGCCGGACACCGTCGTCGGG^ 

49921 TGGCCGCCCTCCACGCCGGAGCCGCCTACCTGCCCCTGGACCCGG 

49981 GGCAGCGGCAGGTGCTCACCGAGGCCGGCGCCCGCCTGCTCGTCCTGCCCGCCGGCCTCG 
QRQVLTEAGARLLVLPAGIiD 

50041 ACACCCCGCTCCGGGCCTGCGGCCTGCCCGTCGTGGCCCCGGACGACCTCGGCGCGCCCA 

««t n * n rz 1. pVVAPJ-'L' ijtjMrx 



Ar LRACGLPVVA 
50101 TCGCCCCCGTGTCCGTCCACCCGGAGCAGCTGGCGGCGG^ 



A P 



51061 TGCTCGCCGACCGGCTGCC^CCGTACGCGGTCCCCGCCGAACTCGTCCGCCTGCCCGCCC 
LADRLPPYAVPAELVRLPAL 

5U21 TG CC C ACCAC C CCC AACGGCAAGGTCG ACCACAC CCGGCTGC CCG CGGC CGG ACGGG AC C 



49800 
49860 
49920 
49980 
50040 
50100 
50160 
50220 



50161 CCGC^ACGCCCAAGACGATCGGCGTCCCGCAGCGCGCCCTGGCC^ 

GTPKTIGVPQRALAGYLRWA 

50221 CGATCGGCCACTACCGCCTCGACGAGGAGACCGTC^ 50280 
IG HYRLDEETVSPVHSSLGF 

50281 TCGACCTGACCGTCACCGCGCTGCTCGCACCGCTGGCCGCCGGCGGGCAGGCGCGGCTGA 50340 
DLTVTALLAPLAAGGQAku 

50341 CCGACTCCGGCGACCCGGGTGCCCTCGGCGCGGCACTGGCCGCCGGC^ 50400 
DSGD pGALGAALAAGHHTLLt 

50401 TCAAG AT C ACCCCGG CCC ATCTGGCCG CCCTCG CCCAC C AGTTGGG CG CG C CG AC CGCAC 50460 
KITPAHLAALAHQLGAPTAL 

50461 TGCGCACCGTCGTGGCCGGGGGCGAACCCCTGCACGCCGGC^CGTCCGCGCCCTC^CG 50520 
RTVVAGGEPLHAGHVRAIjKA 

50521 CCTTCGCGCCCGGCGCCCGGCTCGTCAACGAGTACGGGCCGACCGAGACCACCGTCGGCT 50580 
FAPGARLVNEYGPTETTV^u 

50581 GCTGTGCCCACGACGTCGCACCGGACCCCGGCGAGGCGCCCATCCCCGTCGGTAC^ 50640 
CAHDVAPDPGEAPIPVGTP1 

50641 T CG CGGGC CT C AGCGCGTG CGTCGT CG ACG ACG CG CTGCCCG CAC CGCC CGGCGTGCGGG 50700 
AGLSACVVDDALPAPPGVRG 

50701 GCG AG CTGT AC ATCGG CGGG ACGGGCGTCACC CGCGG CT ACCTGGG C CGGCCCG CX3G CCA 50760 
rri. VTGGTGVTRGYLGRPAAi 



50820 



50761 CCGCCGCCGCCTACGTGC^^ 

AAAYVPDPAAPGARRYRTt>u 

50821 ACCTGGCACGCCGGCTGCCGGACGGCACCCTGCTCCTGGCGGGGCGCGCCGACCGC^GG 50880 
LARRLPDGTLLLAGRADRQV 

50881 TGAAGATCCGCGGCCACCGGGTGGAACCGGGGGAGGTCGAGCAGGTGCTCGGCGGC^CC 50940 

kirghrvepgeveqvlgghp 

50941 C CGGGGTG CGGG AGG CGGCGGTCGTCG CCC ACCCGGC AC CCGG CGG CGG C CG CCGGCTGG 51000 
GVREAAVVAHPAPGGGRRbV 

51001 TCGCGTACTGGGTACCGGCCGAACCGGCCCGGCCACCGTCCGCGGACGCGCTCACCGCGC 51060 

aywvpaeparppsadaltal 



51120 
51180 
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PTTPNGKVDHTRLPAAGRDR 

51181 G G CG ACTGG CGG AACTG CTCG AC CGG ATCG AGG C ACTGT CCG ACG CCG AGGCGGC CTCX3G 51240 
RLAELLDRIEALS.DAEAASA 

51241 CACTG CG CG A C AG CCGGC C CGC AC CCGGG AGTGG CG ATG ACCG AG CATG ACG ACCAC CCG 51300 
LRDSRPAPGSGDDRA* 

51301 CCGGCCCGCCGGGGCCCCGCCGGTTCCGCTGGCCCCGGCGGAAGCCCGCCCGTCCCGCAC 51360 

51361 GTGCCGGTGCCCGGGCATGACGACCGCGTCGGACGGCTGCCGGCGGACCGGAGCGTCCCG 51420 

51421 CCGACCCGCCGATTCTCTGGGGACCCCGCCGGTTCCGGTGGTGGCCCGCCCGTCCCGCAC 51480 

51481 CCGGAGG TGCCG ATGCG CGGG CATG ACG ACCG CG TCGGACGG CTGTCG GCGG ACTGG AG C 51540 

MRGHDDRVGRLSADWS (or£21) 

51541 GTCCCGCCGACCCGCCTGCCCGCCGGGGACCCGGCCGGTTCCGTCGGCCCCGGCGGAGGC 51600 
VP PTRL PAGDPAGSVG PGGG 

51601 CCGCCCGTCCCGCACGAGGAGGTGACGATGTCGGAGTATGACGACCGCCTCGCGCGGCTG 51660 
ppVPHEEVTMSEYDDRLARL 

51661 TCGG ACAAC C AGCG CG C CCTGCTGG ACCGCTGG CTCG C CG AGG AC CCCGCCGG CG GTG CC 51720 
SDNQRALLDRWLAEDPAGGA 

51721 GG C CCG CTTCGCCCCGACGGCCGCC CGC CCCGCACCGAGGCCG AG CGG ATCCTGG CCGGG 51780 
GPLRPDGRPPRTEAERILAG 

51781 G T CTGGG AGG AGGTGCTGG AG AC CGG CGGG ATCGGCGCCG ACG ACG ACT ACTTCG CG CTC 5184 0 
VWEEVLETGGIGADDDYFAL 

51841 GGCGGAGACTCCGTCCACGCCATCGTCATCGTGGCGAAGGCCCGGCAGGCCGGACTCGCC 51900 
GGDSVHAIVIVAKARQAGLA 

51901 CTGACCGCCCATGACCTCTTCGAGGCCAGGACCCTCGCGGCCGTGGCGCGGAGAGCCGCC 51960 
LTAHDLFEARTLAAVARRAA 

51961 CCGGCCGGCCCCGCCGAGCCCGTCCCCGACGCGGGCGGCGGCGCGGTCCGGTACCCGCTG 52020 
PAGPAEPVPDAGGGAVRYPL 

52 021 ACCCCT ATG CAGCAGGG CATG CTCTACCACTCGGCCGGCGGCAGCACGCCCGGCGCCTAC 52080 
TPMQQGMLYHSAGGSTPGAY 

52081 GTGGTGCAGGTGTGCTGCCGGCTGACGGGGGACCTCGACGTGGCCGCCTTCCGCACCGCC 52140 
VVQVCCRLTGDLDVAAFRTA 

52141 TGGCAGGCCGTGCTGTCCGCCAACCCGGCGCTGGCCGTCTCCTTCCACTGGTCCGACGGC 52200 
WQAVLSANPALAVSFHWSDG 

522 01 TCCCCGCCCGAGCAGGTGGTGGACCCCGACGCGCGCGTCACCGTCGACACGGCCGACTGG 52260 
SPPEQVVDPDARVTVDTADW 

52261 CGGGACCGCACCCCGGCCGAGCGGGACGATGCCTTCGCCCGCTTCCTGGACACCGACCGC 52320 
RDRTPAERDDAFARFLDTDR 

52321 GCGGCGGGCTTCGACCTCGCCCGCGCCCCGCTGATGCGGCTGACGCTCTTCCGCGAGGGC 52380 
AAGFDLARAPLMRLTLFREG 

52381 GAGCACG CGT ACCG CTG CGTGTGG ACCCAC CACC ACCT CGTCCTCG ACGG CTGGTCCCAG 52440 
EHAYRCVWTHHHLVLDGWSQ 

52441 CAGCTCGTCCTGCGCGACGTCCTCGACTGCTACATGCGCCTGCGCGCCGGACGCGGCGCC 52500 
QLVLRDVLDCYMRLRAGRGA 

52501 GAGCCGCCCGCCCGGCCGTCCTTCACCGGTCATCTGCGCCGGCTGGAGCGGCAGGACGGG 52560 
EPPARPSFTGHLRRLERQDG 

52561 ATCGACGAGGAGTTCTGGCGCGACCACCTCGGCGGCCTGCCCGCACCCTCCCGCGTCGCC 52620 
IDEEFWRDHLGGLPAPSRVA 

52621 GGTCCCGGCTGCCGCG ACGG CCGGGTGGTCGCCGTACGGCGCGCCGAGCACCGGC ACCG G 52680 
GPGCRDGRVVAVRRAEHRHR 

• ^3 ' 
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5268 i «ctccgo^ 52740 

52741 cccIcclcJJc^ 52800 

52861 LoccLLcIx^c^^ 52920 
52921 LllL^^ 



5304 



.„„ cclcl^^ 



GCGAULjAL.UjM^'-vj^v-'^^' 

ATT TEGETP mtwtvvtgaGGF (orf 20) 

5346 i tcatcggc^ 53520 

53521 JJJkLo^ 53580 

5370 i T ccx~^ 53760 

5376 1 ^S«^ 53820 

53821 53880 

53881 01^4?^ 53940 

5394 i gJJca^ 54000 

54 ooi cgL^ 54060 

54121 GGCTcLcCGCTCCGACGTOTTOGAACCGGTI^CATCGGCTCCGAGGAGCGCGTCGACA 



54120 



54180 
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54181 



LARSDVAEPVNIGSEERVDI 

TCG CGTCG CT CGTCG AGCGG AT CG CCGGGGT CG C CGGG AAG AAGGTG CGCTG CGCCTTCG 
ASLVERIAGVAGKKVRCAFA 



55081 GACTTCCCGATGGTGACACGGCCCTGGTACGGCACCCTCGGCTACTTCACCCCGACGATC 
DFPMVTRPWYGTLGYFTPTI 



55321 TTCCTGCGCACCGAGCCGGACCGCGAGGCGGAGACGGACAGGCTGCGGGCCGTCATCGCC 
FLRTEPDREAETDRLRAVIA 



54240 



54 241 CCCCCGACCGCCCGGTCGGGCCCCGCGGGCGCGTCTCGGACAACACCCGCTGCCGCGAAC 54300 
PDRPVGPRGRVSDNTRCREL 

54301 TG CTCGG CTGGG C ACCGG AG ACGTC CCTCG CGG C CGG C CTGGAGCGC ACCT ACCCGTGG A 54360 
LGWAPETSLAAGLERTYPWI 

54 361 TCGAGCGCCAGGTCCTCGCCGAGGCCGGGAGGGCCGATGCCTGAGCACCGCACACCGGTG 54420 

M (orfl9) 

ERQVLAEAGRADA* 

54421 AAGGACCTCGGCCGGCTGCTGCTCGGGCACGCCGCGCGCTTCCGGGGCCGCGAGCTGCAG 
KDLGRLLLGHAARFRGRELQ 

54481 GACGTCGCCACCCGGGCGCTGCGGGCCTCCGGCGGGGAGAACGCCTGGGTGGTGTCCGTC 
DVATRALRASGGENAWVVSV 

54541 GTCAACACCAGTCTCCGCGCCCGCCAGGCCGTGGACCACGCGCTGCGGCTCGCCCCCCGC 
VNTS LRARQAVDHALRLAPR 

54 601 CGCGGGCTCTCCCGGCTGCGCTACCCGTTCTCCGCCGCCCACCACACGGCCACCCCGCCC 
RGLSRLRYPFSAAHHTATPP 



54480 



54540 



54600 



54660 



54661 CGGACCCTGTCGCTGCTGTGCCCGACCCGCGAACGCGTCGGCAACGTCGAACGCTTCCTC 54720 
RTLSLLCPTRERVGNVERFL 

54 721 GACAGCGTCGCCCGCACCGCCGCCGCG CCCGGCCGGATAGAGGCCCTCTTCTACGTCGAC 54780 
DSVARTAAAPGRI EALFYVD 



54840 



54 781 GACGACGACCCCCAACTCCCTGCCTACCACGAGCTGTTCGAGCACGCCCGGTGGCGCTAC 
DDDPQLPAYHELFEHARWRY 

54 841 GGACGGATCGGCCGGTGCGCCCTGCACGTCGGCGCCCCCGTCGGCGTACCCCACGCCTGG 
GRIGRCALHVGAPVGVPHAW 

54 901 AACCACCTGGCCCGGAACGCGGCCGGCGACGTGCTGATGATGGCCAACGACGACCAGCTC 
NHLARNAAGDVLMMANDDQL 

54 961 TAG AT CG ACTACGGCTGGG ACACCGCCCTCG ACGC C CG CGTCAC CG AACTG AGCG C CCTG 
YIDYGWDTALDARVTELSAL 

55021 C AC C CCG ACGGCGTC CTGTGCCTGT ACTT CGACG ACGG CC AGT AC CCCG AGGGCGGCTG C 55080 
HPDGVLCLYFDDGQYPEGGC 



54900 



54960 



55020 



55140 



55141 TTCCAGCAGTGGGAGGTCGAGAAGTGGGTCTTCGACATCGCCGACCGGCTGCACCGGCTC 55200 
FQQWEVEKWVFDIADRLHRL 

55201 T AC CCCGT CCCCGGCGTCCT CGTCGAAC ACCGG C ACT ACCAGG ACTAC AAGGC ACCCTTC 55260 
YPVPGVLVEHRHYQDYKAPF 

55261 GACGCCACCTACCAGCGGCACCGGATGACACGGGAG AAGTCCTTCGCCG ACCACGCCCTG 55320 
DATYQRHRMTREKS FADHAL 



55380 



55381 CGGGCAGG G AACAC C CCGG ACG C CG ACC ACG CCG ACC ATGCCGTTCACG ACGCGGAG ACC 5544 0 
RAGNTPDADHADHAVHDAET 

55441 TTCTGGTTCACCGGCCTCCTGCGCGAGTCCCACGCCAAGCTGCTCGCGGAACTCGACGAC 55500 
FWFTGLLRESHAKLLAELDD 

55501 G CG C CGGG C CCGG C CGCCGG AG CCGTGCTCTTCGCCG ACGG CTCCTGG ACCGG CGTCGC C 55560 
APGPAAGAVLFADGSWTGVA 

55561 TACCGCACCCACCCGCTGGCCACCGCCCTGCTCGCCTCGATCCCCGAGGCCACCCTCGAC 55620 
YRTHPLATALLASI PEATLD 

35 
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55621 TCCGGCCGCGCCGACCTCCTCGTCGTCCC^ 55680 
SGRADLLVVPPGASHHHPDG 

55681 ACCGTCGACTCCGCGTTCGGCTCCGACGCCGGCCTCCGCGTCCTGTTCGGACTGCGCGTG 55740 
TVDSAFGSDAGLRVLFGLRV 

5 5741 CCGGACGCCGCGCAACTCCGCGTCGGCGACGGCCCGGTGCCCTGGGGCAATGGGCAATGC 55800 
PDAAQLRVGDGPVPWGNGQC 

55801 CTGATCCACGACACCGCCGCACCGAGCACCCTGCGCAACGACGGCACCGAATCTCTGGCC 55860 
LIHDTAAPSTLRNDGTESLA 

55861 GCCCTCACCTTCGTGGTGCCGCGCCCGGCACCGGGGGAGTGAGGCCCGTGTGCGGCATCG 55920 
ALTFVVPRPAPGE* 

Ab MRPVCGIV (orflS) 

55921 TGGCG AT CCG CTC CG C CG ACGGCGG ACTCGACGG CGGTG AACTCAC CG CG CCG ATGG CCG 55980 
AIRSADGGLDGGELTAPMAD 

55981 ACCTGCGCCCGCGCGGCCCCGACGGCGAAGGCACCTGGGTCTCGCCCACCGGCCGGGCCG 56040 
LRPRGPDGEGTWVSPTGRAA 

56041 CCCTCGGCCACACCCGGCTCGCCGTGATCGCCC.CCGACGCCGGACGCCAGCCGGTCGCCG 56100 
LGHTRLAVIAPDAGRQPVAG 

56101 GC CCGG ACGG C ACCGTCCGG CTCG TCGT CAACGGCG AGTTCT ACGG CT ACCGGG AGATC C 56160 
PDGTVRLVVNGEFYGYREIR 

56161 GCGCGGAACTGCGCGCCGCCGGCTGCCGGTTCCGCACCGGCAGCGACAGCGAGATCGCCC 56220 
AELRAAGCRFRTGSDSEIAL 

56221 TCCACCTGTACCTGCGGGACGGCCGGCGGGCACTGGAGCGGCTGCGCGGCGAGTTCGCCT 56280 
HLYLRDGRRALERLRGEFAF 

56281 TCGTCCTCTGGGACGAACGCCGCGCCACCCTCTTCGCCGCCCGCGACCGGTTCGGCGTCA 56340 
VLWDERRATLFAARDRFGVK 

56341 AAC C C CTCTACT ACAC CG AG CG CG ACGGGCGG CTCT ACGTCGCCTCG ACGGTCAGGGC CC 56400 
PLY YTERDGRLYVAS TVRAL 

56401 TGCTCTCCTGCGGCGCCCCCGCCCGCTGGGACACCGCCGCCTTCGCCGCGCACCTGCAGC 56460 
LSCGAPARWDTAAFAAHLQL 

56461 TCGGCCTGCCCCCCGACCGCACCCTCTTCGCCGGCATCCGGCAGCTCCCGCCCGGCTGCC 56520 
GLPPDRTLFAGIRQLPPGCH 

56521 ACCTCATCGCCGACGCCCACGGCACCCGCGTCACCCCCTACTGGGACCTCGACTACCCGC 56580 
LIADAHGTRVTPYWDLDYPP 

56581 CCGCCGGCGAACTCGCCGCCCGGGGAAGCCTGGACGACCACCTGGACGCGGTACGCGAAC 56640 
AGELAARGSLDDHLDAVRER 

56641 GGACCGACGAGGCCGTACGGTTGCGTACCGTCGCCGACGTGCCCCTCGCCTGCCACCTCA 56700 
TDEAVRLRTVADVPLACHLS 

567 01 GCGGCGGCCTGGACTCCTCCGCCGTCGCCGCCTCCGCCGCCCGCCACACCCGGCTCACCG 56760 
GGLDSSAVAASAARHTRLTA 

56761 CCTTOVCCGTCCGCTTCGACGACCCCGCCTTCGACGAGAGCGCCGTCGCCCGGCGCACCG 5682 0 
FTVRFDDPAFDESAVARRTA 

56821 CCGCCCACCTGGCCATOSACCACCGCGAAGTCGCCTCGGAACGCGCCCACTTCGCGGACC 56880 
AHLAIDHREVASERAHFADH 

56881 ACCTGCGGGACGTCGTCCGCGCCGGCGAGATGGTGCAGGAGAACTCGCACGGCATCGCCC 56940 
LRDVVRAGEMVQENSHGIAR 

56941 GGTACCTGCACAGCGCGCACATCAAGAAGGCGGGATTCACCGCCGTCCTCGCCGGGGAGG 57000 
YLHSAHIKKAGFTAVLAGEG 

57001 GCGGGG ACG AACTGTTCCT CGG CT ACCCCCAGTT C CG C AAGG ACCTG ACG CTCAGCCTGT 57060 
GDELFLGYPQFRKDLTLSLS 
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5,061 CCGCCGACGCCCGCGACAAGGCCGACCGC^CTACGCCCGGCTGGTCGCGGCCGGGCTCC 

TGCCGCCGTACCTGCG^^ 
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57120 



57121 



57181 



57241 



, 57180 

p p Y L R T L L G T L G F L P 'T"w I V D 

ACCGCCACCTGG^ *™ 

AACTGGCCC^^ ™™ 



RAD 



A A P L L 



57360 



57301 GGCGCGCCCCG^^ 

TGCTCGCCGCCGAACGCCTCGACGCGGCC^GGCCGTCGAGGTGCGGCTGCCCCT 57420 



57361 



57421 



57481 



LAAERLDAAQA 
ACC^CCACCTC^CGAC^ ™° 
CCGGCAAGTACCCGCTGCGGGCCGCC^TGCGCCACCGGCTGCCGCGCGAGGTGACCGAGG 57540 



G K 



"pLRAAMRHRL 



57541 GCCGCAAACAGGGCTTCCTCGCACCTCCGATGGCCGACGACGACACCCTCCTCGAC^ 57600 
RKQGFLAPPMAU^" 



57660 



57601 



57661 



TGCGCGAACGCCTCGCCGGACCGG^CGCG 

TCCGCGCCCTG^ 

57721 AACTCCTCCAACTCGTCGCGAGCACCGCCGAACTGGCCGACGAGTTCGGCCTCACC^ 
LLQLVASTAt-ij^^^ 
CCCCCAGCGGGCAGAAAGGCGGCAACGGTGGCTGACCTCGATCCCGGCACGCTCTCCGAG 57840 



57720 



57780 



57781 



P S G Q 



K G G N G G 



57841 GCCGAGCTGACCGCCCGGATCGCCGCCCTGTCCCCCGAACGCCGGGCGGCGTTCGAGAAG 57900 
57901 ATGCTGCACGGCGCCGCGCACCCCCGCCCCGGCATCCCGC^ 57960^ 



57961 



MLHGAAHPRPGIPRR 

CCGGCCTCCTACGGCCAGGAACGCCTGTGGCTGCTCACCGGGCTGCTGCCCACCG^ 58020 
PASYGQERLWLLTGLLP1AI 

58 0 2 1 AACTACGCCACCGCCCTGCGGCTGCX3CG^CGACCTGTCCGTCCCCGCGCTCCGCG 58080 
NYATALRLRGDLbVfM 

58081 CTGCGCGGCATCGTCCGCCGCCACGAGGTGCTGCG "140 



58141 



GRSADTGRLMREEAKKt-r 
58261 GAGCACGGGCCGCTCCTGCGGC^ 5832 ° 



58321 



E H G P L L 
CTGCTGGCCGTCCACCACGCCGTCACCGAC^^^ 

LLAVHHAVTDGWSNGVLVlt, 



58381 CTCGCCACCGGCTACCGGGAACTGCGCGCCGGACGCCCCGACCGGCTK3CCCGCCCCGCCG 
LATGYRELRAGRPDR Kl ' M 
GTCCAGTACGGCGACTACGCGCACTCGCAGCGCGAGCGGCT 



58380 



58440 



58500 



58441 



V Q Y G D Y A 
58501 GCCCTGGAGGACTACTGGCGCACCGCCGTACGCGACCTGCCCAGGACGGACCTGCCCACC 58560 
WRTAVRDLfK 1 



A L E D Y 
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58620 



58561 GACCGCCCCCGCCCCGCCGCCCGGCGCGGCGAGGGCGCCAACCACGCCCTGCTGCTCTCG 
DRPRPAARRGEGANHALLLS 

58621 CCGGAGCTGACCGGCCGGCTCGCCGACCTGCGCCGCCGCGAGGGCGGGTCGCTGTTCATG 
P ELTGRLADLRRRE 'GGSLFM 



58801 GTCAACGTCCTGCTGCTGCCCTTCGAGACCGGCGGCCGGACCTCCTTCGCCGAGCTGTGG 
VNVLLLPFETGGRTSFAELW 

58861 CGGCGGGTCCGCGGCCGGCTGGTGGAGGCGTACGCCCACCAGGAACTGCCGCTGGAGAAG 
RRVRGRLVEAYAHQELPLEK 

58921 GCCCTGGAGCTGCTGCGCGCCGACGGCACCGCCCCCGCCGACCCGCCGGTCGGCGTGGTC 
ALELLRADGTAPADPPVGVV 

58981 TGCGTCGCCCAGCAGCCCGCCCCCGCGATCACCCTGCCCGGACTCGACGCGAGCGTCGAG 
CVAQQPAPAITLPGLDASVE 

59041 GACGTCGACCTGGGCACCGCCCAGTTCGACCTCGTCGTCG AGGTGCGCGAACGGCCGGAA 
DVDLGTAQFDLVVEVRERPE 



59161 CTCGCCGACCACGTGCACGCCGTCCTCGACCAGGCCGCCGCCGACCCCACCCTGCCCTGT 
LADHVHAVLDQAAADPTLPC 



59461 GACGCGGTGACCGCCACCCTCGCCGCCCTCAAGGCCGGCGCCGCGTACGTACCCCTCGAC 
DAVTATLAALKAGAAYVPLD 



58680 



58681 CTCGTGCTCTCCGCGCTCCTGGTCGTCCTGCGTGGCACCGGCGGCCGGGACCGGCTCGCC 58740 
LVLSALLVVLRGTGGRDRLA 

58741 GTCGGCACCCTCGTCGCCGGCCGCACCCGCCCCGAACTCGAGCCGCTCATCGGCTACTTC 58800 
VGTLVAGRTRPELE PLIGYF 



58860 



58920 



58980 



59040 



59100 



59101 GGCGTGCAG ATCGCCTTCCAGTACGACCGGGACCTGTTCGACGCGGCCACGGTCCGGCTC 59160 
GVQIAFQYDRDLFDAATVRL 



59220 



59221 GCCGAGCTGCCCGCCCCGCCGGCCCCCGCGGCCCCGGCCCGCACGGCCGGCGCCACGACG 59280 
AELPAPPAPAAPARTAGATT 

59281 CTGCACGCCCTGTTCGAGTCCCGCGCCGCGAAGAGCCCCGACGCGGTCG CCCTCGTCGAC 59340 
LHALFESRAAKS PDAVALVD 

59341 GGCGGCCACCGCGTCACCTACCGGACCCTCAACACCCGCGCCAACCGGCTCGCCCGCCAC 594 00 
GGHRVTYRTLNTRANRLARH 

59401 CTGCGCGCGGTCGGCGTGCGTACCGAGGACCGGGTGGCGCTGCGCCTGCCCCGCGGCACC 59460 
LRAVGVRTEDRVALRLPRGT 



59520 



59521 CCCGCCCTCCCCGAGGAACGGCTGACCCGCGTCCTCGCCGACGCCCGCCCCGCCGTGGTC 59580 
PALPEERLTRVLADARPAVV 

59581 CTCACCCCCGCGTATCTGCACGACCGGTCCGCCGAGATCACCGCCCACGCCGGCCATGAC 59640 
LTPAYLHDRSAEI TAHAGHD 

59641 CTCAACCTCCCCGTCCACCCCGACAACCTCGCCTACCTCCTCCACACCTCCGGATCCACC 59700 
LNLPVH PDNLAYLLHTSGST 

59701 GG CACCCCCaAGGG CGTCCTCGG CAc CCACCGGGG CG CGGTCAACCGCGTCGACTGGATG 59760 
GTPKGVLGTHRGAVNRVDWM 

59761 AGC ACCGCGT AC CCGTTCCGG AC CGGCG ACGTGG CCGTCGC CCGCACCGCG CCCGGCTTC 59820 
STAY PFRTGDVAVARTAPGF 

59821 GTCGACGCGGTCTGGGAACTCTTCGGCCCCCTGGCCGCCGGCGTCCCCCTCGTCCTCCTG 598 80 
VDAVWELFGPLAAGVPLVLL 

59881 CCGACCGACGAGGCGCGCGACCCGGCCCTGCTGACGGCGGCGCTGGAACGGCACCGGGTG 59940 
PTDEARDPALLTAALERHRV 

59941 AGCCGG ATGGTGACGGTCCCGTCGCTGCTGACCATGCTCCTGGACGAGTCCGCCCGCGCG 60000 
SRMVTVPSLLTMLLDESARA 

60001 ACGGACCTCGGCACCCGCCTGGCCTGCCTCCGCACCTGGATCACCAG CGGCG AG CCCCTG 60060 
TDLGTRLACLRTWITSGEPL 

<3JB 
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60061 


CCGCCCGCGCTCGCCCGGCGGTTCCACGACCGCCTGCCCGGCCGCACCCTGCTGAACCTG 
PPALARRFHDRLPGRTLLNL 


60120 


60121 


TACGGCTCCTCCGAGACCGCCGCCGACGCCACCGCGGCCCGCATCGACCCGGCGCCCGGG 
Y G S SETAADATAARIDPAPG 


60180 


60181 


ACTGCGCTCCCGGAGCGGTCCCCGATCGGCACGCCCATCACCGGCGTCAGCGCCCTCGTC 
TALPERSPIGTPITGVSALV 


60240 


60241 


CGCGGCCCGGACCTGCGCCCGCTGCCCGCGCTGATGCCCGGCGAGCTGTACGCCGGGGGC 
RGPDLRPLPALMPGELYAGG 


60300 


60301 


GCGTGCGTGGCCCGCGGCTACCACGCCCGTCCGGCCGAGACCGCCGCGGCGTTCCCGCCG 
ACVARGYHARPAETAAAFPP 


60360 


60361 


GATCCCGACGGCGGGCCCGGCGCCCGGATGTTCCGTACCGGTGACAGGGCCCGGCTGCGG 
DPDGGPGARMFRTGDRARLR 


60420 


60421 


GCCGACGGCCGGCTGGAACTCCTGGGGCGCGTGGACCGGCAGGTGCAGATCCGCGGCCAG 
ADGRLELLGRVDRQVQIRGQ 


60480 


60481 


CGCGCCGAGCCCGGCGAGGTCGAACACGCCCTGCTGGCCCACCCGGCCGTACGGGCCGCC 
RAEPGEVEHALLAHPAVRAA 


60540 


60541 


GCCGTCACGGCGAACCCCGACGCCACCGGCCTGTGGGCGTACGTGCGGCTCGCTCCCGGC 
AVTANPDATGLWAYVRLAPG 


60600 


60601 


CCGTTCGCCGCCGGCTCCCCCCAGACCGAGCTGACCGCCTTCCTGCGCCGCACGCTCCCT 
PFAAGSPQTELTAFLRRTLP 


60660 


60661 


GC CCAC CT CGTGCCC ACCGCCGTC ACCGTCCTGG ACG AG CTGCCGGTG ACCGCGCACGG C 
AHLVPTAVTVLDELPVTAHG 


60720 


60721 


AAGACCGACCACGCGCGGCTGCCCGCCCCCGACCCCCGGGCCGGGCGCCCCGCCCCGACC 
KTDHARLPAPDPRAGRPAP T 


60780 


60781 


GCCCCCCGCACCCCCACCGAGCGTACGGTCGCCGACGTCTTCGCCGGGGTGCTCGGCCTG 
APRTPTERTVADVFAGVLGL 


60840 


60841 


GAGGGGCCGGTCGGCGCGCACGACGACTTCTTCCTCCTCGGCGGGCACTCCCTCCTCGCC 
EGPVGAHDDFFLLGGHSLLA 


60900 


60901 


GCCCGCAGTCGCGGCGGAACTCCGCGCCCGCCGCGGCGTCCGGATCGGGCTGAGCGACGT 
ARSRGGTPRPPRRPDRAERR 


60960 


60961 


CTTCGCGGCCCCCACCGTCGCCGCAGCGTCGCCGCCCGGACCGACGCCGCCCGGCCCGGC 
LRGPHRRRSVAARTDAARPG 


61020 


61021 


ACCGGCCCCGAGCACACCCCGTTCGTCACCGACCCCGGCGCCCGGCACGAGCCGTTCCCG 
TGPEHTPFVTDPGARHEPFP 


61080 


61081 


CTC ACCG ACGTCCAGCGGG CCT ACT ACGTGGGACG CG AGGGCGGGTT CG C CCTCGGCGG C 
LTDVQRAYYVGREGGFALGG 


61140 


61141 


GTCTCCACC^CGCCTACCTGGAGATCGAGGCCCCGCGGATCGACGTCGCACGGTTTACC 
VSTHAYLE I EAPR I DVARFT 


61200 


61201 


GGCGCGCTGCGCGGGGTGATCGCCCGGCACCCCATGCTGCGCGCCGTGATCCGTCCCGAC 
GALRGVIARHPMLRAVIRPD 


61260 


61261 


GGGCTCCAGCAGGTGCTCACCGACGTCCCCCCGTACGACGTGGCCGTGCACGACCTGCGC 
GLQQVLTDVPPYDVAVHDLR 


61320 


61321 


GACCTGGACGAGCCCGCGCGGCAGCGCCGACGCGCCGCGCTGCGCGAGGAGATGTCCCAC 
DLDEPARQRRRAALREEMSH 


61380 


61381 


CAGGTGGTGCCCGCCX3ACCTCTGGCCCCTGTTCGACGTCCGCGTCTCCCTCGGCCCCACG 
QVVPADLWPLFDVRVSLGPT 


61440 


61441 


GACGCCCTCGTCCACGTGGGGGTGGACGCGCTGATCTGCGACGCCCACAGCTTCGGCCTC 
DALVHVGVDALICDAHSFGL 


61500 


61501 


GTCCTGGCCGAACTCGCGGCCCGTTACGCCGACCCCGCACGCCGCTTCCCGCCCCTGACG 


61560 
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VLAELAARYADPARRFPPLT 

61561 GCGGACTTCCGGGACCACGTCCTCCATCAGGAGG03CTCTOCGGAACCGCCGAGTACGCG 61620 
ADFRDHVLHQEAL R G T A E Y A 

61621 GCGGCGGAGCGGTACTGGCGCGAACGCCTGCCCGAGCTG6CGCCCGGCCCCGAACTGCCC 61680- 
AAERYWRERLPELPPGPELP 

61681 CTGGCCGTCGCGCCCGAGACCCTCGGCACCCCGCGCTTCACCCGCCGCTCCGGCCGGCTG 61740 
LAVAPETLGTPRFTRRSGRL 

61741 GACGCGGCCTCCTGGACGGCGGTCAAGGACCGGGCCCGCCGCGCCGGGCTCAGCCCCTCC 61800 
DAASWTAVKDRARRAGLSPS 

61801 GGCGTACTGCTGGCGGCGTTCGCCGAGGTGATCACCX3CGTGGAGCGGCCOTCCGCGCTAC 61860 
GVLLAAFAEVITAWSGRPRK 

61861 TCGCTGATGCTGACGGTCTTCGACCGCCCGCCGCTCCACCCGGACCTCGGGCGGATCGTC 61920 
SLMLTVFDRPPLHPDLGRIV 

61921 GGCGACTTCACCTCGCTCAGCCTGCTGGAGGTCGACCACAGTCGGCCCGGCGACTTCACC 61980 
GDFTSLSLLEVDHSRPGDF1 

61981 GACAGGGCCCGCGCCCTCCAGCGCCGCCTGTGGCAGGACCTCGACCACCTGGCGGTCGGC 62040 
DRARALQRRLWQDLDHLAVG 

62041 GGCGTGATOGTGACACGGGAACGGGCGCTGCGCCACGACGCCCX3ACCCGGTCTGCTCACA 62100 
GVTVTRERALRHDARPGLLT 

62101 CCCGTCGTCTTCACCTCCGACCTGCCTGTCGGCGAGACCGCGGCCGAGGACGCGGACGGG 62160 
PVVFTSDLPVGETAAEDADG 

62161 GGAGAGGGATGGGCGCTCGGAGAGCCCGTCTACGGCGTC^GCCAGACCCCGCAGGTCCAT 62220 
GEGWALGEPVYGVSQTPQVH 

62221 CTCGACCATCAAGTCGCCGAAGACCGAGGGGAGTTGGTCTTCAACTGGGACGCCGTGGAA 62280 
LDHQVAEDRGELVFNWDAVE 

62281 GACCTGTTCGCCCCGGGCGCCCTGGACGCCATGTTCGCCGCCTACACCGCCTCGCTGACC 62340 
DLFAPGALDAMFAAYTASLT 

62341 CGCCTGGCCCGGAGCCCCGAAGCCT«3CGGCGGCCCGGCACGCCGCCGCTGCCCACCGCC 62400 
RLARSPEAWRRPGTPPLPTA 

62401 CAGGCGGCCGTGCGCCGGCGCACCGCCGCGACCC^GGCGCCCCTGCCCGCCCGCCTGCTG 62460 
QAAVRRRTAATEAPLPARLL 

62461 CACGAGGCCGTCGGCGACGCGGCCCGGCGCCACGCCGACCTGACCGCCCTGGTCGACGGC 62520 
HEAVGDAARRHADLTALVDG 

62521 GAC^CCCGGATGACCTACCGGCGACTGACCGAGCACGCCCGGCGCGTCGGCCGCACGCTG 62580 
DTRMTYRRLTEHARRVGRTL 

62581 CGCCGCCTCGGCGCCCGCCCCGGCCGCCTGGTCCCGGTGGTCGCCCGCAAGGGGTGGCGG 62640 
RRLGARPGRLVPVVARKGWR 

62641 CAGGCCGTCGCCX3CGCTGGGCGTCCTGGAGTCGGGGGCGGCGTACCTGCCCCTGGACCCC 62700 
QAVAALGVLESGAAYLPLDP 

62701 GAACTGCCCGCOSAACGGCTCGTCCACCTCGTACOTCGCGCCGAAGCCGCCCTCCTCCTC 62760 
ELPAERLVHLVRRAEAALLL 

62761 ACCGAACGCGCCCTGCTGGACACGCTCGCCGTCCCCGTCGGCGTCACCGTGCTCGCGGTG 62820 
TERALLDTLAVPVGVTVLAV 

62821 GACGACGACGCGGCCCTCGACGCCGACGGCGGCCCGCTGCAGAGCGTGCAGAACCTCACC 62880 
DDDAALDADGGPLQSVQNLT 

62881 GACCTGGCGTACACCATCTTCACCTCGGGCTCCACCGGCGAACCCAAGGGCGTCATGATC 62940 
DLAYTIFTSGSTGEPKGVMI 

62941 GACCACCTCGGCGCGGCCAACACCCTGGAATGCGTCAACCGCCGCTTCGGCACCGGCCCC 63000 
DHLGAANTLECVNRRFGTGP 

' 3D 
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6300! GGCGACGCGGTCCTCGCCGTCTCCrCCCCGAGOTCGACCTCGCCGTCTACGACCTGTTC 63060 
GDAVLAVSSPSFDLAViu 

63061 GGCGTGCTGGCCGCCGGCGGCACCGTGGTOT 

63121 GGACACTGGGCCGAGCTGATCCGGCGCGAGCGGGTCACCCTGTGGAACTCCGTCCCCGCG 

GHWAELlRR EKVAiJ 
63181 CTGGGCACCCTGCTCACCGAGTAC^ 

63241 CGGGCGGTGCTCCTCAGCGGCGACTGGATCCCCCCcggacCgCCcgaocGGATCCGCGCC 

ravllsgdwipj-»« 



63301 CIOTCMOCCCWK^^ 

63361 TCGGTCTGGTACGAGATCG^GAAGGTGCACGAGG^ 

63421 CCCATGGCCA^ 

63481 GTGCCCGGC^ 

63541 «ACMMCTC^ 

63601 GGGG^CGCC^^ 

63661 CAGGJGAAGATCGG^^ 

63721 CTGCCCGACGTCGCCGCO^^ 

LPDVAAGAVIATijUf^vj 

63781 CTCGTCGGCTTC^^ 

63841 CAACTCGCCCGGCGGCTGCCCGCCTACATGGTCCCCACGACCCTGCTGCCCC 

QLARRLP AYMVPi 

63901 CTGCCGCTGACCGCCAACGGCAAGGTCGACCGGGCCGCACTCCW 

LPLTANGKVDRAALQRl^Vfu 

63961 OGCjaCOG^ 

64021 GTGCCCGGCTGGCTCGCC^ 

64081 GACGCGAACTTCTTCGCCCTCGGCGGCACCTCCCGGGTCGCGATCACCCTGGTCACCCGG 

DANFFALGGTSRVAX 

64141 ATCGAGGCCCGACTCGCCGTCCGC^TGCCCCTCGCCCGCCTCTTCGACGCCra 

IEARLAVRVPLARLFDAK 11 ' 

64201 GGCGGCCTCGCCGAGACGATCGCCGAACTGTCGGCCGCCGCCGAGGAGGAGC03GCACCC 
GGLAETIAELSAAAEEEfAr 

64261 GCCGAGCCCGTGTACGCCCCCGACCC^^ 

AEPVYAPDPATRHEft r u x 

64321 AT^WOXX^^ 

64381 CACACCTACCTCGAACTCGACGTCGAGGACCTCGACC^ 

64441 CGCCGGCTGATCGACCGCCACGACGCCCT^ 
r R L IDRHDALRL 

31 
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63180 
63240 
63300 
63360 
63420 
63480 
63540 
63600 
63660 
63720 
63780 
63840 
63900 
63960 
64020 
64080 
64140 
64200 
64260 
64320 
64380 
64440 
64500 
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64501 CAGATCCTCGGCGACGTACMCCGTACCTCCTCGCC^CACCGACCTGCGGGGCAGGGCG 64560 
QI LGDyPPYLLAHTDLRGKM 

6 4561 GACGCCGAG^ 64620 

64621 TCCCGCTGGCCGCTGTTCGACGTACGGACCCACCGCCTGGACGACGTCCGCACCCGGCTG 64680 

SRW P LFDVRTHRLDDVK x k u 

64681 CACCTGAGCTTGGACCTGCTCATCGCCGACGCCCACAGCCT 64740 

HLSLDLLlADAnb 

64741 CTGCTCACCTTCTA^ 

64801 GACTACGTCCTGGCCGTCCGCGCCCACGCCGAGGGCGAGCCGCGCCGCCGCGCCCTCGAC 

- LAVRAHAEGEPKKkauw 



D Y 

64861 CACTGGCGGGCCCGGCTC^CCGA^ 

HWRARLADLPGPPGLPL RCK 

64921 CCCGAGGAGCTGACCGCGCCGCGGTTCGCCCGCCTCACCA^ 

PEELTAPRFARLTT01.Oi'L>« 

64 981 TGGGCACGGCTGCGGCGCGCCGCGGCGGCCGCCGAACTC^CCCCGGCCG^CTGATCTCC 
WARLRRAAAAAELTPAAbJ."-^ 

65041 GCCGCCTTCTGCGACGTCCTCGCCCAGTC^ 
J\ A PCDVLAQWSD 

65101 ACCACCTTCCACCGCCCCGCCCTGCTCCCCGGCGTGGACGACCTCGTCGGCGAC^ 

TTFHRPALLPGVDDLVGDr 1 

65161 accacgaccctgctc«;ggtcgacgg^^ 

TTTLLGVDGEGDTFRDRAKK 

65221 CTCCAGGACCGCATCTGGGAGGACCTCGAACACCG^ 

LQDRlWEDLEHRVV&o 

65281 cgcatgctgcgc^^ 65340 



64800 
64860 
64920 
64980 
65040 
65100 
65160 
65220 
65280 



65341 AGCACCCTGCGGGCCGCCGGCCCC^CCCCC^^ 65400 
STLRAAGPAPRTAPPAWRVK^ 

65401 CCCGGCTACGCGATCAGCCAGACCCCGC^GGTCCTGCrCGACCATCAGGTCAGC^AGAGC 65460 
PGYAISQTPQVLLDHQVSh.b 

65461 GACGGCCGACTGGTCTGCACCTGGGACTACGTCGCGGACGCCTACCCGCCCGGGCTGATC 65520 
DGRLVCTWDYVADAYPPGLI 

65521 GAGGCCATG^CGGGGCCTTCGAGGCGCTCCTCGCCTCGOTCGCCGGTcAcGACGACGAC 65580 
EAMFGAFEALLASLAGHDUU 

65581 GCCGGCCACGACGACGACGCCGGCCACGACGACGGCCCC^CCACGACGACGGCCC 65640 
AGHDDDAGHDDGPGHDDfaf^ 

65641 CACGACGACGGCCCCGGCCACGACGA^ 65700 
HDDGPGHDDGPGHDDGPGKL. 

65701 GACAGTGCCGATCACGGCCACAGTGCCACGCACGACGACAGCGCCGCCCGAAACGACAGA 65760 
DSADHGHSATHDDSAARNDR 

65761 GAGGGAGGTGGACCGGAGTGACGAGCGCCCGGCCCACGCCGACACTGCTCCCCGCCGACC 65820 

E G ° G P ' M* T S A R P T P T L L P A D Q (.rfl6) 

65821 AGCGGGAGCTGCTGCGGATGATGAACGACCGCACCGCACCCGTGCCCGCGCACACCCTCA 65880 
RELLRMMNDRTAPVPAHTlji 

65881 CCGCCCAACTGGCCGACGCCGCGCGCACG^CGACCGGGCTCTGGCACTGGTGGCACCGG 65940 
AQLADAARTHDRALALVAPl. 

32- 
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65941 GTCTGACACTGAGC^CGCCGAACTGGACGCCCGGGCGGCCGCGGTGGCCGCCCGGCTCA 66000 
66001 CCGCCGCGGGCGTCAT^ 



66060 
66120 



A G 

66061 AGGTCGTGGGCGCCCTGGCCGCGCTCCGCGCCGGAGCCGTCTGCCTGCCCGTCGCCC^ 

VVGALAALRALfrtv 

66121 GGCTGCCCCGGCCCGCCCGC^CAGCA^ « 180 

66181 CCCAGTCCTGGCTCACCCAGCGCATCGACTGGCCGCAGGAACTGCCCGTCCTCTCCG^ 66240 

QSWLTQRIDWfU^" 
66241 ACGAACCCGGGCCGCCGGTACCACCCACCACCG^CCCGGCCGA *«O0 

66301 ACGCCGCCTACCGGCTG^CGCCCCCGTCAGCCACCGCGCGATCACCACCG^ 
AAYRLDAPVSHKAX 

66361 AGATCGACCGCGCCTTCCGCGTCG^A^ 
66421 ACTCGCCGCTCGC^ 

66481 TCACCCGGGACATCGACCTGCGCGATCCCGGAGCCCTGCACGAGGCGCTGCGCACC^CG 

T R D I DLRDPGAJj 
66541 GCGTCACCCTCTGGCACTCGCCGCC^ 

66601 ACCGGGGCGGC^U^ACTGCCCGAGTCGCTCCGGCTGGTGCTGCTCGGCGGTOAATOCCTCG 
RGGKLPESLRLVLbO^c" 

66661 ACCCCG^CTCG^ 
66721 TCTCCTCGGCCACCCCGTCCG^ 

66781 CGGAATGGCGCTCGGTCCCCGTCGGCGCGCCCCTGCCCAACCAGCGGGCGCACATCCTGT 
EWRSVPVGAPLPNQRAHl 

66841 CCGAGACCCTGCGGCCCTGCCCGGTCTGGGTOVCCGGCCGCCTCCACTACGGCGGCGTCG 
ETLR PCPVWVTGRLHYGGVA 

66901 CCGCCGAGCCCCCCACCGGAGAGGAGCACX^ 

AEPPTGEEHAPATVPHfc- 

66961 GCGAACCGCTGCTGCGCACCGGGCTGTTCGCCCGCCTGCTGCC 

EPLLRTGLFARLLPEOliiu 

67021 TCGTCGGCGACGAGACCGCCCGGATCAGCGTCCGCGACW "080 

67081 CCGAGACCGCCCTCGCCGCCCACGAGGACGTGCACTCCGCCGTGGTCGTCCCCCT 67140 

ETALAAHEDVHfaAV v 

67141 GGGGAGACGAGTCGCTCGCGCGGGTACGGCTCCACCCCGGCGCCACGGCCGGCCCCGACG 67200 
GDESLARVRLHPGATAur 4 uB. 

67201 AACTCCTCGCCCATCTGCGCCGCAAGGTCTC^ 67260 
LLAHLRRKVSPVLLP&HJ-c 

67261 TGGGCGGTCCGCTGCCGCT^CCCGGGACGGGCGCGTG «™0 

67321 AGGCCCCCGCCCCCGCT^ 67380 

6 7381 luLoO^^ 67440 
E 



66360 

66420 

66460 

66540 

66600 

66660 

66720 

66780 

66840 

66900 

66960 

67020 



WO 00/40704 



PCT/US00/00445 



67441 



67501 



67561 



67621 



67681 



67741 



67801 



CCGTCGAACCCGATATGAACCTGCTCGACGCCGGTGCCACCTCTOTCGAACTCGTCCGCC 6750C 

LLDAGATSVEijVKu 



p D M N 



TGGCGACCGCTCTGGAGGAGGAACTCOTCCTCGACACC^ 67560 

TCCCGTCGGTCGCCGTGATCGTCGGCCGCCACCTCGGCCGCCGGACGGCACCACCGGCCC 67620 
PSVAVIVGRHLGRK i a r r « 

GGGACCCCCTGCCGCCCGCGTCCG^ 67680 

CCGCGCCCGGACCCGTGCCGCCC^^ 67740 



A P G P V P 



CGTCCGAGTCCTCACCGCTCGC^CC^CCCGCACCCGGGCCCGTGCCACCCACGCCCCT 67800 
SESSPLAPPAPGPVffir 

CGCCCGCCTCCGTCCCGCCCGCGTCCGGC*^ 67860 

PPASGAAPHVPFAr r 



A S 



67861 CACCCATCCCCGCGCCCT^ 



PSVPPAPRPQ 



67921 



67981 



68041 



68101 



68161 



68221 



68281 



68341 



67920 



67980 



68040 



gcatcggcgcccgccaggcgTTC^ 

CCACCGACGGCGTCGCCCTCAGCGGCCCGGACGACCACCACCTCACCGCCCGTCGCA^ 
TDGVALSGPDDHHLTARKi" 

ACCACCGCT^CGACCCCGGCCCCGTGACGCTGCCGGACCTGGCCGCCCTCCTCGGGGCCC 68100 
HRFDPGPVTLPDLAALLGAl. 

TCCGCCGGGTCCGCGGCCCGGGAGGCGAACCCAAATACGCCTATCCGTCGGCCGGTTCCT 68160 
RRVRGPGGEPKYAYPSAOsa 

CCTACCCCGTCCAGACCTACCTGCTCGTCCACCCGGGGAAGGTGACCGGACTGCCCGGCG 68220 
YPVQTYLLVHPGKVTGLPQ^ 

GCAGCC*CTACGTCC*CCCCGCGCGCAAC^ 68280 
SHYVHPARNRLVS IDP l Pi i u 

TGCCCGCCGACGCGCACGCCGAGATCAACCGCGCCGCCTACGGGGAGGCGGCCTTCTCCC 68340 
PADAHAEINRAAYGEAAFSb 

TCTACCTCATCGCCGCGATCGACGCGATC^CACCGCTCTACGGCGATCTCTCCTGGGACT 68400 
YLIAAIDAITPLYGDLSWDh 

68401 TCACCGTCTTCGAGGCCGGTGCCATGACCCAG^GCTGATCCGGACCGCCGTCGGCACCG 68460 

TVFEAGAMTQLLMRTAVO 

68461 GCATCGGCCTGTGCCCCGTCGGCACGATGGACCCCGCGCCGCTGCGCCGCGCGTTCGCCC 68520 
IGLCPVGTMDPAPLRR A1? 

TCAOOSACC^^ 68580 
TDRHRFVHALLGGRPRTEAF 



68521 



68581 



68641 



68701 



68761 



68821 



68640 
(orflS) 



68700 



68760 



CGTGAACCGGCACGGCCCCCTGGCGGGCCGGCGGCAGAGCGTCGACACCCGCAGCGCCGC 
MNRHGPLAGRRQSVDTRSAA 

GTGGGTGGCGCCGACGGGCACCCCGGGGCTGCCGCTGGAGGTGGCCGCCACCCGGGACGG 
WVAPTGTPGLPLEVAATRDO 

CGTCGACCCGGCCGAATGGGCCCGCACCCACCTCGACACCGTCACCGGCTGGCTGCACCG 
VDPAEWARTHLDTVTGWIjHK 

TCACGGAGCCGTCCTGTTCCGCGGCTTCGGCGTCGGCCTCGACGGCWCGGCGACGTCGT 68820 
HGAVLFRGFGVGLDGFGUVV 

CCACGCCCTGGCCGGATCCCCCGAGGCGTACGTCGAACGGTCGTCGCCGCGCACCGCCCT 68880 
HALAGSPEAYVERSSPRTAU 

■ 2* ' 
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68881 CGGGCATCACCTCTACACCGCCACCGACCACCCCGCCGACCAGCCCATCCCCCCGCACAA 68 940 
GHHLYTATDHPADQPIPPHN 

68 941 CG AG AACTCCT ACCAACTC CG CTTCC C CGG ACGG CTGGTCTTCGGCTG CCTCACC C CGG C 69000 
ENSYQLRFPGRLVFGCLTPA 

69001 CCGGACCGGCGGCGCGACCCCGCTCGCCGACACCCGGCGCGTCCTGGGCCGCCTCGACCC 69060 
RTGGATPLADTRRVLGRLDP 

69061 CGCCCTCGTCGCCGCCTTCGCCCGCCGCGGGGTGCTCTACCAGCGCAACTACGGCGACGG 69120 
6y a lVAAFARRGVLYQRNYGDG 

69121 GATCGGCATGTCCTGGCAGGACGCCTTCCAGACCCGCGACAAGGCGGCCGTCACCGCCTA 69180 
IGMSWQDAFQTRDKAAVTAY 

69181 CTG CG C CG C CCG C CGCGTCG ACGTCG AATGG AAACCCGACGGCGGGCTG CGG ACCACC CA 69240 
CAARRVDVEWKPDGGLRTTQ 

69241 GG TCCGCC CCG CC CTCGCCGTCC ACCCGG CGACGGGGGAGCGGGTGTGGTTCAACCACG C 69300 
VRPALAVHPATGERVWFNHA 

69301 CGCGTTCTTCCACGTCTCCGCCCGGCCGCCCGCGCTGCGGGACGCCCTGCTGGCCCAGTT 69360 
AFFHVSARPPALRDALLAQF 

69361 CGACGAACGCGACCTGCCGAGCCACTCCTGCTACGGCGACGGCCGGCCCATCGAACCCGC 69420 
DERDLPSHSCYGDGRPIEPA 

69421 CGTC ATGG AGG AACTGCACCACG CCTACGCCG CCG AACTGGTGG CGCCCG CCTGG CGGG C 69480 
VMEELHHAYAAELVAPAWRA 

69481 CGGCGACGTCCTCCTCGTCGACAACCTCCTCACCGCGCACGGCAGGGAACCCTTCACCGG 69540 
GDVLLVDNLLTAHGREPFTG 

69541 C G AACG C CG CGTCGT CGTCGG CATGGCACAG C CG CTGG ACTGGG ACGAGGTG AG CGCGTG 69600 
ERRVVVGMAQPLDWDEVSA* 

M (or£14) 

69601 ACCGCCCCCGGCACACCGCTGCCCGCGACCTTCGTCCAGCGCGGCCTGTGGCCGTCCACT 69660 
TAPGTPLPATFVQRGLWPST 

69661 CGCCACGCCCGCCCGGCGGAGGTCACCCACGTCCGCGCCCTGCGCCTGACCGGGGACACC 69720 
RHARPAEVTHVRALRLTGDT 

69721 GACACGGCGCGGCTCACCGAGGCCGTCCGGCGGGTCACCGCCGCCCTCCCCGCCCTCACC 69780 
DTARLTEAVRRVTAALPALT 

69781 G CCG AACTCT CCGG CGACG AGG AACC CCG C CTGACC CTCCGG CCGG ACGC CCCCG AGG TC 69840 
AELSGDEEPRLTLRPDAPEV 

69841 ACCCCGGTCGACCTGCGCGGAGCCCCGTCCGCCGGACGCGACGCCGTCTGCGTGGCGCTG 69900 
TPVDLRGAPSAGRDAVCVAL 

69901 CTG CG CGCCG AC CGGGACCACC CTCGCG CCGG ACG CC ACCGGG CCCG CTTC CAC CTGGTG 69960 
LRADRDHPRAGRHRARFHLV 

69961 CGGCTCCACGACGACGAGACGGTGCTCGCGCTCACGGCCCACACCCTCCTCCTCGACACA 70020 
RLHDDETVLALTAHTLLLDT 

70021 CCGTCTCTCTACG CCGTGCTCGG CGCGGTCTG CCAGG CGTACGCCGGCCG CTTCCG CCCC 70080 
PSLYAVLGAVCQAYAGRFRP 

70081 GAGCACTACCGCGACGCCACCACCCTGCCCGACGCGCCCCACGCCCCCCTCTCCGGTCGG 70140 
EHYRDATTLPDAPHAPLSGR 

70141 GCCCGGGCCTCCCGCCGGCGCTGGTGGCACCGGCGCCTGGCCGCCCTGCCCGGCCCGGCC 70200 
ARASRRRWWHRRLAALPGPA 

70201 CCGGCCCCCGCCGGCCCGCCCCGCGACCGGGTGACCGAAACCCACCGGCTGCGCATCCCC 70260 
PAPAGPPRDRVTETHRLRI P 

70261 GCAGCGCGCTGGAAAGCCCTGACCGCCCTGACCGCCCTGGGCGGCCCCCTCGGCGGCAAC 70320 
AARWKALTALTALGGPLGGN 

70321 



GGCTCGCTCGCCGTCATGGCCCTGGCCGCCTGGTGCCTGCGCGCCCCGGACCACCGGGGA 70380 

3^ 
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GSLAVMALAAWCLRAPDHRG 



70381 CCGGCCCGCTTCACCACC^TCGTC^CCTGCGCGACCACCTCGGACTC^GCCCGCCGTC 
PARFTTVVDLRDHL.GLGPAV 

70441 GGCCCGTTCACCGACCGCCTCGTCTTCGGCGCCGACCTCGGCGAAGCGCCGCGCCCCTCC 
GPFTDRLVFGADLGEAPRPS 

70501 TTCCGGGACGTCACGCTGCGCGCCCAGTCCGGGTTCCTGGACGCCGTCGTGCACTACCTC 

FRDVTLRAQSGFLDAVvniu 

70561 CCCTACGGCGACGTCGTGG^CTCgGCAGGGAACTGGGCCGCGTCACCGCGCCCCGCACC 
PYGDVVELGRELGRVTAPRT 

70621 GCCGCGCACTGGGACGTGGMCTGAACTTCTGCCGCAACCCGCCCACCAGCGCCGCCA 
AAH WDVALNFCRNPPTSAA1 

70681 CGCGGCGAAj^ 

70741 CTGCTCGGCGCGGCCGGCACCGGTCCCGCGCACQK3TGGGACGGCACGGTCCTCGCCCTC 
LLGAAGTGPAHRWDGTVLiAIj 

70801 TCCCTAGGCGAACTCGGCGACGACACCGTGCTG^ 

70861 CACCACGGAACCGCCGACCGGCTGCTCCACCGGATGGACGAAGCGCTCCTGGCGGCCGTC 
HHGTADRLLHRMDEALLAAV^ 

70921 GCCGACCC<MACGCCCCCcWcCCTra^ 

ADPDAPLPPLPAPAHTTR&H 

70981 CGATGACCACGACCCCGCGGACCGCCGCCGAGCCCACCTACCACGTGGTGGTCAACGACG 
MTTTPRTAAEPTYHVVVNDE 

R * 

71041 AGGAGCAGTACTCGATCTGGCTCGCCGAACAGGAGATCCCGGCCGGCTGGCGGGCCACCG 
EQVSIMLAEQEIPAGWRAlva 

71101 GAACCTCCGGCACCCAGGAGGAGTGCCTGCGCC^ 

71161 GCCCCCGCAGCCTGCGCGAGGCCATGGCCGCGGCGGAGCACGCGGAGCCCGCTCCCGCCC 
PRSLREAMAAAEHAEPAPAf 

71221 CGGCCCCGGCTOAGGAGGAGCCGAGCCTCGTCGACCGGCTCTGCGCGGGCGACCAGCCGG 
APAEEEPSLVDRLCAGDQ1-V 

71281 TGGAGTCGGTCCTCCGCCCGGAGCGCACGGCCGCCGCCCTGCGGGAGGCCGTCGACCGCG 
ESVLRPERTAAALREAVLiK" 

71341 GCTACGTCTTCGTCCGCTTCGCCGCCACCCGCGGCGGCACCGAACTCGGCGTraCCGTCG 
YVFVRFAATRGGTELGVAVU 

71401 ACCCCGCGG^CCACCATGGACGGCACCGAGCTGCGCCTGACCGGCACCCTC^ 

PAATTMDGTELRLTG 1 L, 1 u v 

71461 ACTTCGAACCGGTCCGCTGCCACGCCCGCGTCGACGTGACCACCCTCACGGGCGAGGGCC 
FEPVRCHARVDVTTFTGBuK 

71S21 GCCTGGAGCGCGTGTCCGGCACCTGACCCCCGCCGGCCACCCGGCCGTGAGGCGCGGCTC 
LERVSGT* 

71581 GCMACCGGGCCGCCGACCCACCGAAGGGAGGGACCCCATGACC^CCCC(^TGACCACCCC 

MTTPMTTP 

71641 CACGACCACCCGCACCACCACCCGCACCGCCGTCTTCGCCCACCTCCGCGCCCCCGGCCT 
TTTRTTTRTAVFAHLRAPGL 

71701 CGGCGACCTCCTCCAGCGCAACATCGGCCTCGCCCTCGTCCGCCGCGCCCGCCCGGCG^^ 
GDLLQRNIGLALVRRARPAi 

71761 GG CGGTC ACCCTGGT CGTCGGCGAGG ACCTGG CGG CCCG CTTCGGTC CGGCACTCACCCG 
AVTLVVGEDLAARFGPAb i « 

3b 



70440 
70500 
70560 
70620 
70680 
70740 
70800 
70860 
70920 
70980 



71040 
(orfl3) 



71100 
71160 
71220 
71280 
71340 
71400 
71460 
71520 
71580 



71640 
(orfl2) 

71700 



71760 



71820 
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71821 CCAC^CGTACGCCACCGACGTGCTGCCCTGCCCCCAGCGGGGCGACMCCGACCCCCGGTC 71880 
HTYATDVLPCPQR GEADPK " 

71881 GCCCGCCTTCCTGrcCACCCT^^ 71940 

71941 CAGCCAGGGCCTGCACGCCGGCCACGCCCGGGCCGCCGGCGTGCCCGAGCGGATCGGCCT 72000 
SQGLHAGHARAAGVPERIGL 

72001 GCCGCAGGAC^GCCCGGA^ "060 

72061 GTGCMGGACCCCGGACCTGTACGAGTAmCCACTGCCCTCGCCGC^CGCTGGGCCTGCC 72120 
WGTPDLYEYATALAAALGLP 

72121 CGCACaKCGCGCCCCGGCGACXSTCra^ 72180 
APPRP GDVLPELP 

72181 GCOGACGGcisGTCra^ 72240 
PTAGLPRPLVAVHPGGAfHW 

72241 GAACAGGAGATGGCCGCTCGAGCACTACGCCCGGCTCTGCGCCCGCCTCGCGGCCGAACT 72300 
NRRWPLEHYARLCARLAAEL^ 

72301 CTCGGCCTCCCTCTGCCTGCTGGGCGACGAAGCCGAACGCCCCGAGCTGGAACTGCTCCG 72360 
SASLCLLGDEAERPELELLR 

72361 GCACGCCGTCCTGACGCGGTCCCCGCGAGCCGTCGTCCACCTCGAGGCGGGCGCGGACCT 72420 
HAVLTRSPRAVVHLEAGAUIj 

72421 CGACCGGACCGCGAACGTCCTCGCCGACGCCGACCTGCTCGTCGGCAACGACTCCTCGCT 72480 

drtanvladadllvgndssl 

72481 CGCCCACGTCGCCGCCGCCGTCCGCACCCCGTCCGTCGTCCTCTACGGCCCGACC^CA^ 72540 
AHVAAAVRTPSVVLYGPTG 1 

7254 1 CGAGTACCTGTGGACCAGGATCTACCCGTACCACCGCGGGGTCTCCCTGCGGTGGCCGTG 72600 
EYLWTRIYPYHRGV. SLRWPt 

72601 CCAGCGGCTGCGGCACGCCGCAGGCGAACTCGCCGGCCGGCGGTGCGCG^CGGCTGCGT 72660 
QRLRHAAGELAGRRCAHGCV 

72661 CCTGCCCTACCAGGGCCCGGCOMCCCGTATCCGCGCTGTCTGGCCGACCTGCCGGTGGA 72720 
LPYQGPAGPYPRCLADLPVU 

72721 CAGGGTCTGGCCGGCGGTGACCGCCCGATGGGCGAGCCCCCACCCCGTGACGATCAGGAG 72780 
RVWPAVTARWASPHPVTIRS 

72781 TACCCCATGAGCGCCGACCCGTCCCGGGTGCGGACGATCCTCTCCGTCAACTTCAACCAC 72840 

TP M*SADPSRVRTILSVNFNH (orfll) 

72841 GACGGCTCCGGCGTGCTGTTGCGGGAGGGCAGGATCGCCGGCTACGTCACCACCGAGCGC 72900 
DGSGVLLREGRIAGYVTTER 

72901 CGCTCCCGCCTCAAGAAGCACCCGGGCCTGCGCGAGGAGGACCTCGACEAACTGCTGGAC 72960 
RSRLKKHPGLREEDLDELLD 

72961 C^GGCCGGGGCCGACCTCTCCGACATCGACCACXSTCATGCTCTGCAACCTG^CACCATG 73020 
QAGADLSDIDHVMLCNLHT M 

73021 GACACACCCGACATACCCCGGCTGCACGGCTCCGACCTC^GGAGACCTGGCTCGCGTTC 73080 
DTPDI pRLHGSDLKETWLAr 

73081 TGGGTCAACCAGCGCAACX3ACGAGGTGAGCCTGCGCGGCCGCCGCATCCCCTGCACCGTC 73140 
WVNQRNDEVSLRGRRIPCTV 

73141 AACCCGGACCACCACCTCAWGCCGCCACCGCCTACTACACCTCCGGCTAC^ 73200 
NPDHHLIHAATAYYTSGYU& 

73201 GCGATGGCCGTGGCCATCGACCCCACCGGCTGCCGTOCCCTCGCCGGCAAGGGCAGCCGC 7326 
AMAVAIDPTGCRAFAGKGSR 

'3^ 



0 



WO 00/40704 

73261 CTi 



73 



73441 GA< 



TCTACCCCCTGCGCCGCGACC^^^ 
,321 GTC*CCG A C^ 
,338! CCCTAcLIaCCC^ 

a gacL^ 

LLoLlcGCCACCCTCGCC^CTACATCCAGCTG^ 

LLLLaccgccctcaacgccatcgccacc^ 

TTCGAGCGCAXGCAC^C^CCCCGCC^ 
™cLcA^ 
ATGTACTCCG^ 

73861 

7 ,»i ggcLc^^ 

caccgcagcItcgtcgccgacccgcgcga^ 
gtcLgttcLcgaacacotccggccc^cgcgccgtccgtgct 

GAGTGGTTCGGCCTCTCCGACAGCCCCrcCATGCTGCGGG^ 

EWFGLSDSPFMLRATfv 

ggcgtgcccgccatcacccacgtcgacgggacgtcgaggatccagtcggtcacccgccag 

GVPAlTHVDGTbKJ-v 
GACACCCCCGCCTTCCACGAC^ 

GTGCTCAACACCAGCCTCAACAC^^ 

VLNTSLNTKGEPJ-rt 

GGCCGGACGGCGGCCCGCTCATGAGCGCCCCGCGGOTCGAGCGGACCCGGTOCOTCGCGC 

MSAPR GbK 
GRTAARS* 

TCGAACGCGACATCGCCGCGATCTCGGCCGA 

ACGAGGACTTCGCCGCGCTGGGCGGCAACTCCATC^CGCCATCAAGATCACCAACCGGG 
EDFAALGGNSIHAIKITNK 

TGGAGGAACTCGTCGACGCCGAGCTGTCCATCCGCGTCCTCCTCGAGACGCGCA 
EELVDAELSIRVLLETRl 

! CCGGCATGACGGACCAQ3TCCAOTCCACGCTCACGGGGGAGCGGGACCGGTOAACACCGA 
G MTDHVHATLTGERDR mmtd 

' 38 ' 
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73320 



73501 
73561 
73621 
73681 
73741 
73801 
73861 
73921 
73981 
74041 
74101 
74161 
74221 
74281 
74341 
74401 

74461 
74521 
74581 TG< 



74641 



M N T D 



73380 

73440 

73500 

73560 

73620 

73680 

73740 

73800 

73860 

73920 

73980 

74040 

74100 

74160 

74220 

74280 

74340 

74400 



74460 
(orflO) 



74520 

74580 

74640 

74700 
(orf9) 
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74701 



74761 



74821 



74881 



74941 



75001 



75061 



75121 



75181 



75241 



75301 



75361 



75421 



CCTGCCCCGGCTGCTCGACCGGA^ 



R L 



D R 



CCTCGACACCTACGTCTGGGGAGCCACCTCGGGC 



T Y V W G A 



P A 



A V 



V A L T 

CCGCACGTTCGCCCGGGGCAGCCGGGCACCGGTCACG 
RTFARGSRAPVTAAVGAGDA 

CTTCACCGCGGCCCTCACCCTCGCCCTCGCCGCCGGCGCCGACTCCGCGGTCGCCGCCG^ 
FTAALTLALAAGADSAVAAb 

ACTGGCCTCCGCCGCCGCCGGCACGGCCGTCGCCACCCCCG^ 
LASAAAGTAVATPGTSTWHA 

75661 CG^M^ 

DELRRLLGGTGKVCRTG l ijf 

CGCCCGGCTGCTCGACCCGGCCGCCCGCGACCGCCGGGTCGTCTTCA^ 



75481 



75541 



75601 



74760 



74820 



74880 



CGTCACCCTGACCTCCGTCGCC^ 

CCGGGCGCTCGGCGCCGAACCGGTGCTGCTCTCCGCGACGGGTGACGACCGCGCCG^CCG 
RALGAEPVLLSATGDDRAl,K 

CCGGCTCCGCGAAGCCCTCCGTGCGCGGGACGTCGACACCG^ 

cLcggL^ 

CGACGAGGGCGGCGAACACCCGTTGCCCGTGGCGACGGACACCGGAAGCCGCCTGCTCGA 75120 
DEGGEHPLPVATDTGSRLLE 

ACGGGCCGCCGGCCTGCTGCCCGCCGTCGACGCCGTGATCGTCTCCGACTACGGGTACGG 75180 



75721 



75781 



A R 



P A 



R D 



R V 



74940 



75000 



75060 



CGTGTGGGAGCCCGACACCGTCGCCCGGCTCGCCGCACACCGCGAACTCGGCCCGTCCAC 75240 
VWEPDTVARLAAHRELGPS I 

CCTX3GTCGTCGACTCCCGCCGGCCCGCGCGCTTCACCGCGCTGCGGGCCAGCGCCGTCAA 75300 
LVVDSRRPARFTALRASAVB. 

ACCCAACCACGCGGAGGCGCTGCGCCTGCTCGACGCCGGCGAACCCCCGCCCGGCCCGGC 75360 
PNHAEALRLLDAGEPPPt.*-* 4 

CAGGGCGGACTGGGCGGCCGCCCTCGGCGACCGGCTCCTGCGCCTGACGGGAGCCGAACG 75420 
RADWAAALGDRLLRLTGAEK 

GGTCGCCCTCACCCTGGACGCCGACGGATCACTGCTCTTCGAACGCGACCGGCCCCCGGT 75480 

LDADGSLLFERDRP , 

75540 



75600 



75660 



75720 



75780 



CGACCTCCTGCACGGCGGCCACGTCTCCTGCCTGAGCCGGGCCAAGGAACTGGGCGACCT 75840 
HVSCLSRAKELGUij 



H G G 



75841 



GCTCGTCGTCGGCGTCAACTCCGACGCGAGCGTCCGACGCCTCAAGGGCCCCCGTCGCCC 75900 
LVVGVNSDASVRRLKGPRRP 

75901 GGTGATCCCCCTCGCCGAACGCATGCGCGTCCTCGCCGCCCTG 75960 
VIPLAERMRVLAALSCVUi^v 

CGTGCCCTTCGAOTACGACAGC^ 76020 
- - — ▼ " r EALK l "^ tl>v 



75961 



F D D D S 



A A 



76021 



CGCCAAGGG^^ 76080 
AKGGDYTLATLPEAPLVQR^ 

76081 CGGCGGCGTCGTCCACC^^ 76140 
HLLPSVADTb l i u ± 



G G V 



76141 GCGCATCCACGCCCTGTCCAGGACCGGCGAGGGAGACACCCCATGAGCCACGCCATCGGA 76200^ 

2f\ 
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RIHALSRTGEGDTP* 

76201 CCGAGCCGGCTGATCCCCGCCATCCGCGAAGCGCTCGGGGACGAG^GGACCCCCGGCTC 
PSRLIPAIREALGDEKDPRL 

76261 GCCCTCTACGTCCACGTCCCCTTCTGCTCCTCCAAGTGCCACTTCTGCGACTC 

ALYVHVPFCSSKCHFCDWV i 

76321 G AC ATC C CCGTCG CACG CCTG CGCGGCGAC AG CCGGG AACGCTCG C CCTACGTC ACCGCC 
D I PVARLRGDSRERSPYVTA 

76381 CTCTGCGACCAGATCCGCTTCTACGGCCCCCAGCTCACCCC^CTCGGCrACCGCCCCGAG 
LCDQI RFYGPQLTRLGYRPE 

76441 GTCATGTACTGGGGCGGCGGCACCCCCACCCGGCTCACCGGCGACGAGATGACGGCCGTC 
VMYWGGGTPTRLTGDEMTAV 

76501 CACCAGGCCCTCGACGACGCCTTCGACCW 

HQALDDAFDLTGLRQWSVEb 

76561 ACCCCGAACGACCTCGACCCCGCCACCCTCGACACCCTGCGCGGCCTCGGCGTCACCCGC 
TPNDLDPATLDTLRGLGVTK 

76621 GTCAGCGTCGGCGTCCAGTCGCTCAACCCGTACCAGCTGCGCAAGGCAGGCCGGGCCCAC 
VSVGVQSLNPYQLRKAGRAH 

76681 TCGCGCGAACAGGCCCTGG CCG CCGTCCC C CTGTTGCGC CG CGC CGG CATCG ACG AGTTC 
SREQALAAVPLLRRAGIDEF 

76741 AACGTCGACCTGATCGCCGGCTTCCCCGGCGAAGCCGTCGAGTCCTTCGAGGAGACCCTG 
NVDLIAGFPGEAVESFEETL 

76801 CGCACCGTCCTCGCGCTCGACCCGCCGCACGTCTCCGTCTACCCCTACCGCGCCACCCCC 
RTVLALDPPHVSVYPYRATP 

76861 AAGACGGTCATGGCCATGCAGCTCGACCGCGAGTTCGTCGAGGCCCGGAACCGGGACGGC 
KTVMAMQLDREFVEARNRDG 

7 6921 ATGATCGACGCCTATGAACGGGCCATGGCCGCGCTCGGCGCCGCCGGCTATCACGAGTAC 
MIDAYERAMAALGAAGYHEY 

76981 TGCCACGGCTACTGGGTGCGCGACGCGCGCCACGAGGACCAGGACGGCAACTACAAGTAC 
CHGYWVRDARHEDQDGNYKY 

77041 GACCTGGCCGGCGACAAGAkGGCTTTGGCAGCGGCGCCGAATCGATCATC^T^C^C 
DLAGDKIGFGSGAESIIGHH 

77101 CTGCTCTGGAACGAGAACAGCGCCTACGCCCGCTACCTGCTCGCCCCCCGCGAGTTCTCC 
LLWNENSAYARYLLAPREFS 

77161 gccgcccaccggttcaccaccgccgaacccgaccgcctgAccgcccccgtcggcggcgcg 

AAHRFTTAEPDRLTAPVGGA 

77221 ctgatgacccgtgaaggcgtggtcttcgcccgcttccgcagactgaccggcctggacttc 

LMTREGVVFARFRRLTGLDF 

77281 GCGGACGTCCGCGCCACACCGTACTTCCGCCAGTGGTTCGAGCTCCTGGAGCGCTGCGGC 
ADVRATPYFRQWFELLERCG 

77341 GGCCGCTTCGTCGAGACGCTOTACAGCCTCCGCCTGGAGCCXSTCCACCATCCACrcCGCC 
GRFVETPYSLRLEPSTIHRA 

77401 TACATCACCCACCTCGCCTAC^CCATGGCCCATGGCCrGGCCCCCGAACGTO 

YITHLAYTMAHGLAPERA* 



76260 

76320 

76380 

76440 

76500 

76560 

76620 

76680 

76740 

76800 

76860 

76920 

76980 

77040 

77100 

77160 

77220 

77280 

77340 

77400 

77457 



SEQ ID NO: 2 ORFS BLM gene cluster ORFfl 31-40 

(notice this part is on the reverse strand and the l"t nucleotide (18660) 
the first (1) on the whole cluster of 77457 bp. Also the last orf (40) is 
incomplete and contains frame shifts) 



46 



! gtgaccgac^c™^^^ 60 

MTENLPSCPECSSAi 
(orf31) 

DNPEDGAIRDAV^ 

m a^cgtcacggt^^^ 240 

m f CCACGGG r c_ 360 

361 ACGCCGGCC^GGCCCTGCCCAGGCTCCACTACGCCGCGGCGCAACCGAGCCGGAACGGG .20 

421 GCCCGGGCCCGCTCCAAGTCCCGTTCCGTGCGCGGCCGCGGCAGCCAGGCCGTGTTCACC 480 

48 1 CTGGGGTCGCCGTCCCCGTTCGCACGCGTCGTACACGCCACCACGCACGGCACGGAACTC 540 

541 CCCCAACICGCCACG^CCCCAAGTCCCCGCGTGCCCGGATCCGCCCGGACCGGCGTCGG 600 

60 1 TCCGCCCGCCGGGCCGCGGCCGGGTCCCCGGGCCGCGGCGGGAGGGGGTCTCGCGCCGTG 660 

66 1 GAACGCCGGCCGGAAATTTACGTATAGGTAGAGATCCCGGCGAAGCGATCGGCGCGTTAT 120 

721 GGCAGCATCCGCGCCGGCCCGCCGCGCAGTTCCTCX3GTCCCGGACCGATGGCGTCAAAAG 780 

,81 TGAGCGACGAAATCGCCGGATCGCGCGAGGACCGTCGCGGGCCGCACGAGGACAACCGGG 840 

B41 GGATATATCAGCGCATTCCCAGGTCACGCGTTGACTGGAAATCGCCTACTTATCGCGTCA 900 

90 1 CGCCTGTAGGGATCATGGCCGGGAATGGCCTCAGACGCT^GAGTGCCCACC .60 

(orf32) 

961 TCCGACTGTCGGCAGCGCG^GGGGATCACGG^ *»• 

1021 ™/Ta^V T ^ 1080 

108 i axcctgcacaaaggtgccgcgaccgotx^^ 1140 

1141 GCCACCCGCCT«^ 

ATRLADAGAGPb i 

12 01 f CGA T CCC^ 

12 61 CCCCTCX*TCCC(^^ »" 
PLDPGI PAGRLKb* 

1321 GAGG^C^^ »" 



<4( 



WO 00/40704 PCT/US00/00445 

1JB 1 TTOCKKAC^^ 

1M1 gcgtcggaaatcgccta^gc™^^ 1500 

1501 GTCGTCTCCTATCGTGATATGGCGCGCTACCTTCA »5« 

l561 AGGGOSGAGATTCTCCGGCTGG^ "20 
1621 GCCGACGAC^CCCGGCC^^ 



1680 



«.! GTCGTGGCGCAGGTCTGGTGCG^ »" 

1741 ttc^gacctgggc™^^ 1800 



F F 



X.01 C T G CT CGGCGTCGAGCTGCCG^^ 1860 
LLGVELPLRALFUAfi 

1861 GCCGCCCGGGTGCGGGCCGAACAGGCCGGCGGCCAGG^ »»• 
„21 GAGCCGGTGGGCCGGAGCGAGCCGCTGCCGCTGTCG^CGCA^^C^G^C 1980 

„B1 CTC^ACCGCTTGATGCCCGACCGCGCCT^^ 2040 
LDRLMPDRAFYTMCDAfK 

2041 GGCGGGATCGACCTGGGTGCGCTCCGGCGG^ "00 
21 01 ACGCTGCGGACGGCGraGTC^ 

2161 G^CGGGCCCGGT^^ 2220 

2221 cccg^cgaccgggaggagg^gg™^ 2280 

228 1 CGGCCGGCXGACGGCGCGCTGCTGCGCGTGOTGGTGGCC^GCTGGCGGA 2340 
RPADGALLRVVVAR 

2 341 GTGCTGGTGGTCAGCACGCAC^CATCGTCTCCGACG^ 2400 



V L V V S 



2 401 GACGAACTCGGACGGCTCTACCGCGAGTCOTTCACCGG^ ^60 

2461 CCGGCCGXCCAGTACGCC^ 

252 1 CA^AGGAG^TC^ 2580 

25 S1 CTGCCCATGGACCACCCGCGGCCCGCCGTGCAGTCCGAGCGGGGCGAGACGGTCGGGTTC 2640 

42- 



PCTAJSOO/00445 

WO 00/40704 

L p „ D H P R P A V Q S E R G E T V G F 
264 1 GCGCTGCCCGACGCGCTGGTCGCCGCGCTGG^ "00 

270 1 CTGTTCATGACGCTGCTCGGCGCCCTC^GGTCCTCCTGGCGCGTCACG 

LFMTLLGA* 1 u v 



D I V V G V 
282 1 PXTIcWj^^ 



2881 GACCTGCTGGACCAGGTGCGCGAGGCCGTC 

2941 ^CGAGG^TCGAGGCG^ '"° 

300! CAGGTCACCTTCCAGCTC^CACACCCGCGG^ "«« 

QVTFQLLG1 ^ n ^ 

306 1 GTCGAGCGGTACCCGGTCCAGGAGGCCGTCTC »»• 

,121 CGGGCCGACGACGGTTCCTACCGGGGGATCCTGAACTACTGCCC^ 3180 

RADDGSYRGILW 

31S1 CGCCaCATGG^^ ™° 
324 1 CCGGGCCGCCCGATCGGTGAGCTGCCGCTCTCCG^ 3300 
3301 GACGGGTTCGGGA^GCGGCACGCCKCG^^ "60 

33*1 GCGGAGGTGGCGCGGACGGCACCGGACGCGCGGGCGGTGACGTGTGGCGCGACAACGCT 3*30 

AEVARTAPDAKMv 

342 1 ACC^AG^ 

34B1 GTCACCCGCGAGACGCCGGTCGCGGTCCGCCTGCCCCGTTC 3S40 

VTRETPVAVRLPKb i u & 

354 1 CTGCTGGCCGTCATGCGGGCGGGCGGCGTCTACGTCCCCCTGGACCCCGACTGGC^ 3SO0 

LLAVMRAGGV 

3 6S1 CCCG^CC^ " 20 

372 1 CCCGCGCCCCGCATCGAC^ 3780 
PAPRlDPDQAAYVITf^ 

378 1 GGCGCGCCG^GGGCGTCGTCGTCCGGCACCGCTCCCT »« 

a43 



WO 00/40704 PCT/USOO/00445 

3341 CAGGCCACC^^ " 00 

39 01 GACGCG^GCG^ 3960 

frrsTTfrr?^ 4020 

402 1 CCCTCGG^G^^ 4080 

408 1 TCGCACCTC^^ 4140 

4X41 GTCATGG TC G^^ 4200 

42 oi ooacicKiBT^^ 4260 

4261 GACCTGTCCGACCCCGCCGA^^ 

4321 CO^GC^i**^^ 

RVLDDRLR pv ^ VXJ 

438 1 GGCGGAACC*GC^ 

4441 GTCGCCGACCCCTACCCCGACArc « M 

4501 CGC^GCGCCCCGA^ < 5S ° 

4561 CGCGG^TCCGCGTCGAACC^^ *«° 

«» rr GCCG^ 4680 

4681 J*™*^ "«° 

4,41 CACATGGTGCCGTCGGCGGTCGTCGTCCTGGAGGOTCTCCCACT 4B00 

HMVPSAVVVLBAlif" 

4801 ^ACCGCGCGCGCCTG^ " 6 ° 

436! GTGGCGCCGCGCGACATGGTCGAGGAGCT «»» 
4921 GTCGACCGGGTCGGTGTGCACGACGACTTCTTCGAG^GCG^CACTCGrTGCTGGTG 4980 

4981 GTCCAGGTGATGACCCGGATACGAAAGCTCCTCGGCGTCG^ "40 

VQVMTRIRK LLGVEV 

5041 TTCGACGCCGCGACGGTCGAGGAGCTCGCCGCCCGCGTCCGCGCCGCACGGACCGAGGGC 5100 

4h 



WO 00/40704 

FDAATVEELAARVRAARTEC 



PCTAJSOO/00445 



5101 CICOOCCC*^ 5160 

5161 TCGTTCGCG^GCAACGCCTTTGGTACCT^AT^G^GGCGCCTCACAGTGTCTCCTAC 5220 

5221 ^TG^AC^ " 8 ° 

5281 CTGCGGACGCTGGTCGAGCGGCACGAGACGCTGCGGACG^ "40 

5341 GTGCCCCACCAGGTGGTCTCGGCGCCCGACG^^ S.00 

5401 GTGCGGATCGAGGCGGCCG^GCGGACCGACGAGGCGGTGCGGGACCTC S4C0 

5461 GCGCGCACCCCGTTCCGGCC^ 5520 



A R 



5521 GCGGACGACGATCACGTGCTCGTGGTCACCACGCACCACATCCT "SO 

558! GTCGACATCCTGGTGGACGAATTGGGGCGCCTGTACCGGGAACACGTCACGGGTGACCCC 5640 
VDlLVDELGRLYRt-HV 

56 4i «»«ic«^ 5700 

5701 ATGACCGGCCCCGTGCGGGAGGAGCAC^ 5760 

57 61 CCCTCGGTC^^ 5820 

5821 GAGACCGTCGAGTTCCCCCTGCCCGCACCACTGGTCGCG^GCTGGAAGCGCTCTCCCGG 5880 
ETVEFPI-PAPIiVARLEAli«-R 

5881 CAGCAGGGCGTC^^ 5940 

5941 TACAGCGGT^ACGACGTGO™^ 6 °°° 

6001 ACCGAGCCCCTGGTCGGCTTCTTCGTCAACACCCTTCCGGTAC^ "60 

teplvgffvntlp 

6061 GAGCTGTCGrrCCGCGCCCTGCTCGACCGGGTCCGCGAGGCCGCG^GGCGCCTTCGCC 6120 
ELSFRALLDRVREAALG AFA 

6121 CATCAGGACCTGCCCTTCGAGGCGCTGGTTOAGGCGCTCGCGCCCGAGCGCGACCTGGGC 6180 
HQDLPFEALVEALAPERDbU 

6181 CACCACCCTCTCGTC^^ 6240 

6241 CTGCACGGCACGGACTGC^ "°° 



WO 00/40704 PCT/USOO/00445 

6301 TCCCTCGACGTCGTCT^^^ ««• 

SLDVVSGRRGKRt- v ^ 

6361 ctg^cga^^ 6420 

6421 gcggccgacgatccccc^ 6480 

6481 cb^cc*™^^ 6540 

6S41 c^cccc^cocccccc^ 6600 

6601 GACTCGACGCTC^CGTTCGC^ 6660 
6661 CGGCGCTGCGCCGTGGCC^^ " 2 ° 

6781 GACTGGCCC^^ 6840 



6641 ACCCGCC^^^ 6900 

69 0X GA^ACCGGGATCCCCTG^ 6960 
6961 T CGG G GCXC«CCCGCGCCC^ 

7021 CTGGGCCACGTACGGCGC^ 7080 

LGHVRRMAEGGPK^^ 

7081 gccatgaccttcgacccgt^^ 7140 
amtfdpsleqfl.wu 

7M1 CACGTCCCGCCCGAGGAC*^^ ™ 
HVAPEEVKKurt. 

72 01 G^CGATC^ 7260 

7261 ^gctggagc^^^ 7320 

7321 «GC«m»«^^ 7380 

7381 CCTACGGAGGCGACGGTCGACGC^ 744<> 

pTEATVDATCHLii- 

7 < 41 C^TCCGCACCCCAC^ ™ 
7501 GTACCIMTGGGCGTCGCCGGCGAAATCTACCTCGGCGGAACCGGCCTGGCCCGCGGCTAC 7 56 0 



PCT/US00/0044S 

WO 00/40704 

v p V G V A G E I Y L G G T G V A R G Y 

7561 c TT co^ 7 "° 

7621 CGCAGC^^^ 7660 
7681 

7741 ««MOCC^ "°° 

7801 o^trr^ 7860 

7861 CATXCGC^^ 7920 

7921 ACCCACACCGACG^^ " 8 ° 

7981 CTCA^TG™ 8 ° 4 ° 

.041 CGGCTCCTGGAACGGCCGGCCGAGCGCGTC "100 

8101 s"™ 0 ^"^ 81 

8161 GTGGACTGGCTCCGGGACGGGCTGCGCCGCCGCCCCGCGCACCGGGTACGGCTGCT 

VDWLRDGLRKKfc'Mn 



.60 



8340 



834 



84 



X GACGCC»CGGCCGACC^^ 8400 



VARAAGEtAAPiu« 



8521 TACTTCGCCGCGCTC^^ 8580 
YFAALAARSPRV i 

eS81 CGGGGACGGC»CCGCAAC^^ 8 "° 
RGRHRNEMSLYRYUV 

8641 GGTGACCGCCCGGCGGCCCCGCAGGCGGAGGTGCTCA ™ 

8701 CTCGCGTCGCTGTCCGCCCGCCT^^^ 8760 
LASLSARLi" 



WO 00/40704 PCT/USOO/00445 

8761 GTCGCCAACGACCGTCTGACGOSGGACAACGAGCTGCTCGACGCACCCGCCCGCACGACG 8820 
VANDRLTRDNELLDAPARTT 

8821 GCCGTCGAGCCCGAGGACCTGTGGGGGCTGGCGGACTCCACCCCCTACCGGGTGAGCGTC 8880 
AVEPEDLWGLADSTPYRVSV 

8881 AGCTGGGCCGCCGCCGATCCGCGGGGCGCGATGGACGTCCTGCTGGTCCGGCGGGACGCC 8940 
SWAAAD pRGAMDVLLVRRDA 

8941 CACGACGACGGTCCGCTGCTCGTCCCCCACCCCGTACCGGAGCCCTCGGCACCGCTGACG 9000 
HDDGPLLVPHPVPEPSAPLT 

9001 AACACGCCGACCCGGCACCCGTCCGCGCGGCAAGGGGGCTCGGCCGCX3GACGGGCTGCGT 9060 
NTPTRHPSARQGGSAADGLR 

9061 TCCTGGCTCGCCGAGCGGCT^CCCGCGCACCTGCTGCCCGCGAGGATCACCGAGGTGGAC 9120 
SWLAERLPAHLLPARITEVD 

9121 GCGCTGCCCCGCACCGGCACCGGCAAGCTCGACCGGGGCGCGCTCGGCGGACTCGTGACC 9180 
ALPRTGTGKLDRGALGGLVT 

9181 G CGGGCCGTGGCG CC CGGG CGGGCGACCGCCCCG CCACCG CCCCCCGTACGGGTCTCGAA 9240 
AGRGARAGDRPATAPRTGLE 

924 1 CGGACCCTGGCCGACGCGTGGGCGCGGGTGCTCGGCCTCCCCGAAGTCGGCGTGCACGAG 9300 
RTLADAWARVLGLPEVGVHE 



93 01 AACTTCTTCGCCCTCGGCGGCGACTCCCTCCTCGCCGTCAGGGCTGTCGCCCGGTGCCGC 
NFFALGGDSLLAVRAVARCR 



9360 



9361 CGTGCCGGGGTCCGACTGACCGTCCGGCAGTTGCTGAGCGAGCAGACCGTCGCCGCGCTC 9420 
RAGVRLTVRQLLSEQTVAAL 

94 2 1 GCGGCGGCCCTCGAGGAGGAGTCTCAATGATGAAGTCAAGCCGCTTGCGCGACCGGCAGC 94 80 
AAALEEESQ* 

A MMKSSRLRDRQL 

(orf33) 

9481 TCGGGGGTGAAGACCOJGTTGTCGCGC^GGAGAGCCCACAGGACGCTGGCCCGACGCCGT 9540 
GGEDPVVAQESPQDAGPTPC 

9541 G CCAGGGCG ATG ACGGCTTGAACGTGTTTGCAGCCCTCGCCGCG CTTCTTG AGGTAGAAG 9600 
QGDDGLNVFAALAALLEVEV 

9601 TCCCGGTTCGGCCCCTCCCGCATCATGCTGGTTTGGGCCGACATGTAGAACACTCGTCGC 9660 
PVRPLPHHAGLGRHVEHSSQ 

9661 AGGCGGCGGCTGTAGCGCTTCGGCCGATGCAGGTTGCCAGTGCGACGACCGGAGTCGCGG 9720 
AAAVALGPMQVASATTGVAG 

972 1 GGGACGGGCACCAGGCCGGCCGCCGAGGCCAGGTGACCGGCGTCGGCGTAGGCCGTGAGG 9780 
DGHQAGRRGQVTGVGVGREV 

9781 TCGCCGGCGGCGACGACGAACTCGGCGCCGAGGATCGGCCCCATGCCCGGCAGAGACTCG 9840 
AGGDDELGAEDRPHARQRLD 

9841 ATGATCTCGGCCTGTGGATGGCTGCGGAACGTCTCGCGGATCTGCTGGTCAATCCGCTTC 9900 
DLGLWMAAERLADLLVNPLQ 

9901 AGACGGTCGTCCAGGGCCAGGATCTGCGCGGCCAGGTCAGCCACGATCTGGGCGGCGACG 9960 
TVVQGQDLRGQVSHDLG GDV 



4B 



WO 00/40704 PCT/US00/00445 

9961 TCCTCCCCGGGCAGCGCGGTCTGCTGAGCCTGGGCAGCCTCCAGCGCCGTCGCGGCGACG 10020 
LPGQHGLISLGSLQRRRGDG 

10021 GCGTCGGCACCGCGCACGCCTCGGTTGGCCAGCCAGGCCGTCAGCCGGGCCCGGCCGCGG 10080 
VGTAHASVGQPGRQPGPAAA 

1008! CGGCGGAGAGCTGCCGGGGTCTtKTAGCCCGTCAGCAGGAC^GOTCGCCCTT^CGAG 10140 
AESCRGLVARQQDQRALLRA 

10141 CTGTAGTCGAAGGCCCGTTCCAGCGCGGGGAAGACGCCGGT^GOTTGT^CGGAGACGG 10200 
VVEGPFQRGEDAGQRVAHl v 

X0201 TTGATCATCCTGACCC^T^CCACGAGG^^^^ 10260 



D H P 



DPVGHEVGTVGGQQREV 



10261 TCGGCGGCCAGCTGGGCGGGCACGTCGATCGACGCGAAGTCCCG 10320 
GGQLGGHVDRREVPSVAG^f 

10321 TCGGCGATGACGTAGGCGTCGCGGGCGTCGGTCTTCGCCTCGCCCCGGTAAGCGCCG 10380 
GDDVGVAGVGLRLAPVSA^H 

10381 ATGCGGTTGACCGTGraGCCGGGCACGTAGACGGCCTGCTGGCCGTGGGCC^ 10440 
AVDRAAGHVDGLLAVGREgc 

10441 GCCAGCAGCAGC^CGGAGGACGTGCCGGAGATOT ^^00 

10501 AGGTCGAGGATCTCACCCATGGCGGTCAGGATCGCCGACTCATCGTTGCCGATOT 10560 
VEDLTHGGQDRRLIVADLLR 

10561 GACCACAGCGTCACACCGGTCTCGTCGACCACCGCCGCCCAGTGATGCCC™ 10620 
PQRHTGLVDHRRPVMPLARV 

10621 TCGATCCCGGCCCAGACCCGGGCCCGTCGCTCGCCCACTCGCCCCTCCTCA^ 10680 
DPGPDPGPSLAHSPLLTPNb 

10681 GCATCCCGTCGACCCGAGGAACACCCCGCTGTCATCTCCGTAAAAAGCGACCGAAGCGCA 10740 
IPSTRGTPRCHLRKKRPKR 1 

10741 CATCrCAATCAGCAGCCAGGGCGCCCCGGAGAACCGGGCGGCCACTCCTTGTAAGCCA 10800 
SQSAARAPRRTGRPLLVSH 

10801 GACGGCAGAGAACCATAAGCCACACCCGGCCCTCCCGGGCCGCCTAACAACTTACGGAGA 10860 

10861 J^TOACTOACCTGCOGTTCCCT 10920 
MTDLPLRTVALTGEESAEV 

(orf34) 

10921 GACGACCTGCTGCGCACGCTGGCCGACGTGCCGGTCGACTCCACCGTGGGACTGCT 109 
DDLLRTLADVPVDSTVGLLH 



80 



10981 CGCACCCGGCTCGCCGCACAGGAACTGCCGCTC H040 
RTRLAAQELPLRIRAELTGM 

11041 CGGCTCTACGACAGCCCGCGCGCCCTCGTCGTCACGGGCTTCGGCGTCGACGACGAACGG 11100 
RLYDSPRALVVTGFGVDDER 

11101 ATCGGACCGACCCCCGCGGCCC^ 11160 
IGPTPAARPAPDPERTRDLE 

11161 CTG CTGCTTTTG CTGC ACGCGG CCCTGCTCGG CG AGGCGTT CGG CTGGGCGACCCAGCAG 11220 
LLLLLHAALLGEAFGWATQQ 



WO 00/40704 PCT/US00/00445 

11,21 AACGGCCGGCTCGTCCACGACGTGCTGCCCGTTCCC^TGAGGAGACCGTOCAGATGGGT 11280 
NGRLVHDVLPVfo 

112S1 TCCAGCAGCGAGACCGAGCTGCTGTGG^CACCGAGGACGCG^C^CCOTCTGTCCTGC 113,0 

U3,l GACTACGTGOTCCTGCTGTCCCTGCGCAACCACCAGTOCGCCG ««. 

1K01 CCCGACCTGTCCCGGC^ »«« 

ATC^CCCGGM*^ 11520 

UH1 CGTTCGCGGCGATC^^ 11580 

i 15 ei GAcrccGAGGACCcGTAC^ 

11«« GCGGCCGCCCGGCGGGCGTACGACACCGTCACCGCGCTCATCGAGGACGAGCTGC^ 117.0 
ftAARRAYDTV I J\ u i. & 

11,01 GTCGTCCTGGACGCCGGTTCACTGCTCCTGGTCGACAACTACCAGGCGGTC 11760 
VVLDAGSLLiLVUW iu 

11,61 AAGC™^^ 11820 

11821 CGCGACCTGCGCCGTTCC^ 11880 

11BB1 AGGCACCATGGArTTCCCCCTCACCCGCGT^ »™ . 
(orf35) 

11941 CCGCCCCCGGGTGCGG^TGC^ 12000 

12001 <*ACTGGCCCGCCGCGCTG^^ 12060 

12061 CGGCGAC^CATCGAACW 12120 

12 i2i ccacgcgcjgccccc^ 12180 

X21B1 GCTGGGGTACGAAGTC^CGGCGCGGCTC^^ 1«« 

1224! CGCCGCGGCCTGCCGTCCCCO^CGTT^^ 1«00 
AAACRPPHVPPDAb^*- 

X2301 CGAGCraCGGCC^ 12360 

!„„ ACTGATGCAAGCGGTG^G^ 12 «° 

12421 C0GCC0G(WcGOCCoitCGACCTO0i 1248 ° 



WO 00/40704 PCT/US00/00445 

RPRPRPLDLPLKVYIGADDD 

12481 CGGCACCGACTGGCGCACCACCCTGGGCTGG^^ 12540 
GTDWRTTLGWRACTARDCEV 

12541 CGTCGTCCTGCCCGGC^ "600 

12601 CGTCGCCACGGACCTCGCCGAAGCCGAGGTAGGGGCATGACCGCGCGCGTCGACGCCACA 12660 

VATDLAEAEVGA* „ » „, 

VAi MTARVDAT 

(orf36) 

12661 CCCACCTACCTGGCGGTGCTGGCGGTGCGCGAGGCCCGCGCCCCGCTOT 12720 
PTYLAVLAVREARAPLLGSC 

12721 CTGGCCCGCATGTCCTTCGCGGTGCTGCCGCTCGCC 12780 
LARMSFAVLPLALLLSVRDA 

12781 ACGGGGTCGrrCGCCGTCGCCGGACTGACCTCCGGCGCGCTGTCGGCCACGCTC^ 12840 
TGSFAVAGLTSGALSATLTb 

12841 TTCGCGCCCGCCCGCGCCOSGC^ 12900 
FAPARARLIDRRGSRSGLVR 

12901 CTGACCGTCCCGTACCTGCTGGGGCTCGCCGTGCTGATCACATTGGCCGAGGCGGAAGCG 12960 
LTVPYLLGLAVLITLAEAEA 

12961 CCC ACCG CGGCGCTG CTCGTCGCCG CCGCGGTCGCGGGCGTGTT CGCGCCG CCG CTCGGT 13020 
PTAALLVAAAVAGVFAPPLG 

13021 CCGACCATGCGCGTGCTGTGGGCGAGGATCCTGCACGGCCGTCAGCCCCTCCTGCACACC 13080 
PTMRVLWARILHGRQPLLHT 

13081 GCCTACGCCCTCGACTCCGTCACCGAGGAGGTGGTCTTCACCGTGGGGCCGCTGCTGGCG 13140 
AYALDSVTEEVVFTVGPLLA 

13141 GGCGGCCTGATCGCGGTCGCGGCACCGCTCGCGTCGATGATCACGGTCATGGTGCTGA^ 13200 
GGLIAVAAPLASMITVMV LI 

13201 GCGGCCGGTACCGCCTGCTicGTGCTGTCCGCCGCGACCGCCGCCGCCCCCGCGTCGGGC 13260 
AAGTACFVLSAATAAAPASG 

13261 GAAGCCGACGAGGACCG^CCGCACGGCCGGCCCATGGCT 13320 
EADEDRPHGRPMALPGMRTI 

13321 GTGCTGTCC1TCGGCGGCGTCGGCCTGGTCGTCGGGGTGCTCCAGGTCGTCCTGCCGTTC 13380 
VLSFGGVGLVVGVLQVVLPF 

13381 ATCGCCGAC^CGCGGGCTCGCCCGGCGCGGGCGGCATCCTGCTGTCCATGCTGTCGGCG 13440 
IADHAGSPGAGGILLSMLSA 

13441 GGCAGCGCGGTCGGCGGCCTCGCCTACGGGCGGATCGCCTGGCGCTCGACGCCCGTC 13500 
GSAVGGLAYGRIAWRSTPVR 

13501 OMTTCGTGGTGCTCGTCAC^ 13560 
RFVVLVTGFTLAVLPLCLTA 

13561 AGCCCGGTGCCGGCCGGGGCCTTCGCCCTCCTCGTGGGACTCrrGCCTCGCCCCGCTGTTC 13620 
SPVPAGAFALLVGLCLAPLF 

13621 ACCACCGCCTACCTGCTGGTCAACGACCTGGTGACG 

TTAYLLVNDLVTA^SGTAPTE 



WO 00/40704 PCT/US00/00445 

13681 -c^cacc^ 13740 

„,« GGTGTGCTGCTCGACTCCCGGGGCCCCACCTTCACCGT<^CCGCCGCG^CKCGGTCGCC 13800 

GVLLDSRGP^ V1V 

13301 CCCGCGACCGCCGTCATGACCG^^ »»«« 
11M1 CC^CCGGCCGCCGCC^ 13920 

„ M1 ACCGATCG^^^ " 98 ° 

(orf37) 

13,81 ^^^^^^^^v^d^^^^^^^^^Q^^^^^^^^^^'^^^^^^' " 040 

14041 GTGCGACCTGC T C»CGGATCCG^ 

CDLLTDPVEVHrtMv 

14101 CGACTGGTCCACGGCCACCGGCGCCGGTCA^ "160 

DWSTATGAGHLRWw i 

14 i«i <*r«rrcooi^ 

14221 CTCGGCCCTCGCCGACGCCCTCGTCGAACGGGAG^ " 28 ° 

14281 CAACCCCAGCGCGCCCGGCGTCGCCACGCTCACCCG ""0 

NPSAPGVATLTKUr 

14341 CATGGACCTCCTCACGGACGGGGAACGCGCCGCTCTG 

MDLLTDGERAALLAEI 



14400 



14401 gcgggokacgccc^ 14460 

14461 ^g^Q^^^^p^^^^^^^^^^^R^^^G^^^^^^^^^^^^^^^^ 14 " 0 

14521 GGCCGCCGTCCTCCA^ 14580 

14S81 C^^^^ 14640 

14641 ^p^y^g^^^^^F^c*^^^^^^^^^^^^^^^^^^^^^^ 14700 

14701 CGCGGCGCTGACCGCGCTCCTCCGCGAACACGAGGCGAGCCGATGACCCTCACCCTGCGG 14760 
AALTALLRBHEASR mtltlr 

(or£38) 

14761 GACGCCTTCCTCGACCAGGCCGCCCGGA^ 14820 

dafldqaartpd 

14821 ACTGTATGGACGTACCGCGMCTGGAACTGCGGGCCGG 14880 

TVWTYRELELRAGRMA« 

52 



WO 00/40704 PCT/US00/00445 
i«8Bi gcacgcggcgccwgccccggcacgctggtggcgg^ 14940 

X4941 GTCGCCGCGCTCCTCGCGGTCGTGCTGACGGGAGC^ «000 
VAALLAVVLTGAGYVPI-auu 



15060 



0 



15001 GACCCGCCG^ 
15061 GAGCACCCCTCGCGGGACG<^CGCACCCTCACCCCGGACGAGGCGCTGGCACCCGCCCGC 15120 

ehpsrdgrtltp 

1S »1 CCGTTCGACGCGGCCCCGGTCCGGGCCGGCGACCCGGCGTACGTGATCTACACCTCCGGC 15180 
PFDAAPVRAGDPAYVIT13" 

15181 TCCAGTGGC^TCCGAAGGGCGTGCTGGTCGAA^ 15240 

15241 CAGGCCCGCGCGCGCTA^ 15300 

15301 TTCGACATGGCCGTGACCAGTCTGTGGGGCCCGCTCGTCAGCGGCGGCGCGATCCACGTG 15360 
FDMAVTSLWGPLVSGGAin 

15361 CTCGACCTGA^GGCGATCGCCTCCGGCACCCAGCCG^ "420 

15421 TCCTTCCTCAAGGTCACTCCGTCCCACCTGCCGCTGCTGTCCCTGCTGCCGGACTCCTGC 1548 
SFLKVTPSHLPLLGL!.pus>- 

15481 CTGCCCACCGGGCAACTCGTGATCGGCGGCGAGGCGCTGACCGGCT "540 

LPTGQLVlGGEALTGSAi J w»r 

15541 TGGCGCGCCGCGCACCCCGACGTCACGGTCGTCAACGAGTACGGGCCCACCGAGGCGACC 15600 
WRAAHPDVTVVNEYGPit«» 

15601 GTC^CTCCTCCGCGTA^ 15660 

15661 ATCGGACGGCCGTTCGCGGGCACCCGCCTGTACGTGCTC^CGCGGACGGCGAGCCGGTC 15720 
IGRPFAGTRLYVLDAuv.o«- 

15721 GCCGTGGGCGGTGTOTGWTGCACATCGCGGGCGACaGCTGGCGCGCGGATACCTG 15780 
AVGGVGELHIAGDQLARGYL 

15781 GGGCGCCCG^GCTCACCGAGGAACGOTOGTCC^ " 8 «° 

15841 CGGATGTACCGCACCGGCGACCTGGTGCGCGAACGCCCG^CGGCGACCTOGAGTACCTC 15900 
RMYRTGDLVRERPDGDLEYIi 

15901 gggojcgcg^cgggcaggtc^ 15960 

15961 GCCGTGCTCTOCKCCACGCGGGGGTGAGGGACTC "020 

16021 p*^^^^^^^^^v^^^^*^^^^P^^ A ^^'^^ A ^^^^^^^*^^^ 16080 
16081 GCGCCGGCGCGGCACGCGGCCGAGGCGCTGCCGCCG'JACATGGTGCCGGCGACGTTCGTC "140 

S3 
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WO 00/40704 

APARHAAEALPPYMVPATFV 
16 X,1 ACCGTGC^^ 16200 
1„01 CCCCCTGCCGGCCACGC^^ "™ 



p P A G 777777 . TPAETLLC 

16320 

;CATCCCGUA^ i ^u„ r . r7 - ; - ~ - - - - 

L L A R A L G 



X„,l ^CTGGCACGGGCC^ 

1632 X «««^ 16380 
„„X CTCCAACTCACCACCGTC^CCCGAA^ 

X644X GACGCCGCC^CGCCCCTCGCCGAAC^^ "500 
DAASPLAEGVPE - (or£39) 

l6 ,.l CCCTCCGCGG^^ 16560 



L G G I 



„5«1 AC^GGGGCCG^CGGCGA^ "«» 

16621 gcgagcgcctcgccaccgacgggctgatcctg^™^ 16680 

ERLATDGLlLLHOi.f 

166fll G CC T CGAC^^^ 16740 

16 7«1 AGCGCTCCACCCCGCGCAGOTTGGTCAAGGGCAACATCTACACCTCGACCGAGTACCCGG 16800 
RSTPRSVVKGNli 10* 

XS80X CCGAC^GCCCATCCCGATGCA^ 

«921 GCGCCGTCCTCGACCTCATCCCGGC^ 16980 

k..i KMcninoM^^ 17040 

TRTFRADMGLSWUc." 
X704X AC^CGGCGM^^^ 17100 
X710X ACTCTCCTG^CACCCGC^ 17160 
X7X6X **IWG<^ 17220 
17221 AGG^CC^AGACGTACGGC^ " 28 ° 

172 SX GCACCCCGATCCCCGAC^ " M ° 
TPI pDADLATVKM/** 
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APARHAAEALPPYMVPATFV 

16141 ACCGTGCCCGAACTGCCGCTCACCCCCAAMGGAAGCTCGACCGGGACGCGCTGCCCGGC 16200 
TVPELPLTPNGKLDRDALPG 

16201 CCCCCTGCCGGCGACGCCGGGCCGGGCGACCXSCACCCCGGCCGAC^CCC^^TGCGAG 16260 
PPAGDAGPGDRTPAETLLCE 

16261 CTGCTGGCACGGGCCCTGGGCATCCCGGAGATCGACGCCGACGCCGACTTCCTGACGTCC 16320 
LLARALGI PE1DADADFLTS 

16321 GGCGGCACCAGCATCACCGCGCTGAAGCTGGTCGCCGGCGCCOTCCGGCTCGGCATCCGC 163B0 
GGTSITALKLVAGARRVGIR 

16381 CTCGAACTCACCACCGTCCTCCGCGAACG^CGGTGCGCCGCATCCTGGCGGCCCAGCCC 16440 
LELTTVLRERTVRRILAAQP 

16441 GACGCCGCCTCGCCCCTCGCCGAAGGAGTGCCCGAGTGACCGGTTCCGTAACGCTCACCC 16500 
DAASPLAEGVPE* <orf39) 

16501 CCCTCGGCGGGATCATCCC^GGCCCCGCGGCGAGGGGCTCACCACCG^CGCCGAGTACG 16560 
L GGI I PRPRGEGLTTGAEYD 

16561 ACCTGGGGCCGCTCGGCGACGCGGGCCCCGACTGgGTGCGGGCCCACGGCCCGTCACTGC 16620 
LGPLGDAGPDWVRAHGPRLK 

16621 GCGAGCGCCTCGCCACCGACGGGCTGATCCTGCTGCACGGTCTGCCCACCGACGGAGAC^ 16680 
ERLATDGLILLHGLPTDGDG 

16681 GCGTCGACGGCTTCCACGACGTCXSTCGGCrCCGTCGGCGGCGACCCGCTGCCC^ 16740 
VDGFHDVVGSVGGDPLPYTE 

16741 AGCGCTCCACCCCGCGCAGCGTGGT^ 16800 
RSTPRSVVKGNIYTSTEYPA 

16801 CCGACCAGCCCATCCCGATGCACAACGAGAACTCCT I 6860 
DQPIPMHNENSYAAHWPSTL 

16861 TCTACTTCTTCTGCCACACCGCGCCGGACACCGGCGGGG 16920 
YFFCHTAPDTGGATPIADGR 

16921 GCGCCGTCCTCGACCTCATCCCGGCCGAGGTCAGGCGGCGGTTCTCCCAAGGGGT 16980 
VLDLIPAEVRRRFSQGVVY 



A 



16981 ACACCCGTACGTTCCGCGCCGACATGGGACTGAGCTGGCAGGAAGCG^CCAGACC^ 17040 
TRTFRADMGLSWQEAFQTED 

17041 ACCGCGGCGACGTCGAACGCC^TTGCCGCGCCCACGGCCAGGAGTTCTCCT^ 17100 
RGDVERHCRAHGQEFSWDGD 

17101 ACGTCCTGCGCACCCGCCACC^CCGCCCGGCGACCGCCGTCGACCCCX^CACCGGAGCrc 17160 
VLRTRHHRPATAVDPGTGAE 

17161 AGGTGTGGTTCAACCAGGCGCACCTGTTCCACCCGTCCAGCCTGGATCCC 17220 
VWFNQAHLFHPSSLDPDLRQ 

17221 AGGTGCTCCTGGAGACGTACGGCGAGAACGGCCTGCCCCGCGACGCCCTGrrCGCCGACG 17280 
VLLETYGENGLPRDALFADG 

17281 GCACCCCGATCCCCGACGCCGACCTGGCAACGGTCCGCGCGGCCTACACCCGCGCC^ 17340 
T p i pDADLATVRAAYTRAALi 
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„,« **acn™a^ 17400 

17401 GCCGCGAGCCCTTCACCGGTCAGCGCCGCGTACTCGTCGCCATGACCTCGGCGGACTCAT 17460 
REPFTGERRVLVAMTS>Aw = 

17461 GAGCCGTGCCGACGCATCGGCACGCCGTCCTCCCGTCGGGGCGCTACCATCGCCGCTGTC 17520 

17521 TCGGCCATCACCCCACCCGGGCGGAGGCAACCGGCCGTGCACATCCCCGCCGTGGTCGCC 17580 

17581 ACGGCACGCGCGATCACCCGCGCCATGACCGCCCAGCCCGTTGTCACATCTGCGGAGGCG 17640 

17641 CCGCGATGACAGAGGTCCC^GGTGAACTGATCCGGGCGCTCCCGGGTGTC "700 
(orf40) 

17701 GTGCGGCGCGGGCGGGGCACACGACCGCC^CCTCGACGCAOTACGGTGTGTCACGTACC 17760 

AARAGHTTAFLDARRt-Vi 

17761 GGGAG^GAGGCG^^ 17820 

17821 AGGGGCAGACCGGGTGGCG™ "880 

17881 CTCCCCGGTGCTGCGGGCCGGAGCGGTAGGGGTGCCGCTCGATTrcGGGGCCACGGACGC 17940 
PRCCGPER'GCRSIPGPRTR 

17941 GGAGCTCGCGTACTTCCTCGAC^ACTGTGGAGCGGTGGCGGTGGTCACCGAGGAGACGCT 18000 
SSRTSSTTVERWRWSPRRRt- 

18001 GCTGCCGCGGGTCTCGCGATCGGCGGGCGTACGGATCCTGGTCGGGGGTTCGGACGCOT 18060 
CRGSRDRRAYGSWWGVRTPi 

1B061 CCCGGAGGGAGCGGCTGCCGGC^TCCACTCCTTCGAGCXMCTCGCGGCGTCGGATCCGGG 18120 
RRERLPASTPSSGSRRR1 

18121 GTGCGCGCCACGGGACGACCTCGGCCTCGACGAGCCGGCCTGGATCCTCTACACGTCGGG 18180 
ARHGTTSASTSRPGSSTRKfc. 

18181 GACCACGGGCCGGAGCAAGGGCGTGGTCTGCGGCCAGCGCGCCGCGCTGTGGTCCGTGGC 18240 
PRAGARAWSAASAPRCGPWK 

18241 GGCGGCGTACGTGCCGTCGTGGGGTCTGGGGCCGCAGGACCGGCTGTTCTCGCCGCTGCC 18300 
RRTCRRGVWGRRTGCCGRCP 

18301 CATGTTCCACGCCTACGCGCACTCGCTGTGCCTGCTCGGGGTCGTGGCCGTGGGCGC(3AG 18360 
CSTPTRTRCACSGWWFWMnft 

18361 CGCGTACCTCCTCGACCGGGGCGCGAGCGTCGTCCGGGCGOTGAGGAACAGCGGTGCAG 18420 
RTSSTGARASSGRIiRHSQAA 

18421 CGTCGTGGCCGGTGTACCCGCCACCTACCGCCTGCTCACGAGCGCCTTCCGCGACGCCCC 18480 
SWPVYPPPTACSRAPSATPi- 

!8481 CCGGCCACCGGCCGGCCTGTOACTGTGCGTCACCGGGGGCTGCGCCGTCCCCGCCGGGGC 18540 
GHRPACDCASPGAAPCPPG^ 

18541 TGCGGGCGGAOTTrGAGGAGCTGCTGGGCGTCCCGCTGCTCGACGGTTACGGCAGTACCG 18600 
RADVEELLGVPL__LDGYGSTE 
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i B6 oi agacc^cggca^g^^ »«« 
SEQ ID NO: 3 BLM gene PPTase ORFS 41 

! GGATCCTGCGCT'ACCCCMACTTCGCCCAGTCGTGCGGCACCGAGCTCACCGCCGACTGG^CGTCCGCTTCC BO 

ei GCCGCGGTCTACGGGCATCTGCACATCCCCCGCGTGACCCGGTACG 160 

161 C0CGC«»A^G«^^ 240 

M l TCTGGTGA.CG^^^ « 

33! xcctLL/cgaIL^ s r 

1 gcccgLIg^^^ 11° 
«!! LLLgggg^^ r.J 
sTi ggatL^ SS 

Z gcLLwc^ s 0 , 

2 Ic^c^ ;ss 
!" gLLLLLLcL™™^ rj 

187LLVPGPVVGGRR^y^ 

B81 gtcgxcacggccatcgccgxcgcggcgccggccggxacc^^ 9 2 f 9 

213VVTAlAVAAPAGTAEESAb^« 

961 G<*CGACCGGACCGCCGTCCCGT^ ^° 

to! cLLlcLcGGGCCCTC™ l "° 

1121 AGTCGGCGACGCAGACGTTGCCGTTGGTCGAGTTGAGCAGCCCGACGATGTCGATGGTGTTGCCGCAGAGGTTGATGGGG 1200 

1201 ATGTGGACGGGGATCTGGATGACGTTGCCCGAGACGACGCCCGGGGAGCCGACGGCCGCCCCCTTGGCGTTCGAGTCGGC 1280 

1281 GAGGGCGGTGCOTGAGACGCCGGCGAGCGCCGTGCCCACGGTGGCGGTGAGGGCCGCTGCCTTGGCGATTCGTGACATGG 1360 

1361 ggtgacacc^ttc^totgaca^^ " 4 ° 

14,1 CGAAGGTTTCGAATCGTGCGGCGGACGGGTGACCGGCGGCCGAACGGCCTCGCCGGGCCCCCGGAAGGTGCCATGACGTC 1520 
1521 CGTGCGCCATCTGTACAGCCCGGTCCCGCGCCGCGTACAAGGGACGGACGGACGGCCGGTGGACGGACGACCGGCGGGGA 1600 
1601 GGGGAGGCCATGAGCCGGATCGCGATCGTCGGGGCGGGTCAGGCCGGACTGCATCTGGCGCTGGGGCTGCTGGGGGCGGG 1680 
1681 GAGCGGCTC^CCCGTCACGAGGTGCTGCTCGTGTCCGACGGGACGCCGGACGAGATCCGCGCCGGGCGGGTGCGGTCGA 1760 
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BOX II. OBSERVATIONS WHERE UNITY OF INVENTION WAS LACKING 
This ISA found multiple inventions as follows: 

This application contains the following inventions or groups of inventions which are not so linked as to form a single 
inventive concept under PCT Rule 13.1. In order for all inventions to be searched, the appropriate additional search fees 
must be paid. 

Croup I daim(s)l-45. 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 8. 
Group II ciaim(s)M5, 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 9. 
Group HI claim(s)l-45. 65-69. and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 10. 
Group IV claim(s)l-45, 65-69. and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 11. 
Group V. claim(s)i-45, 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 12. 
Group VI. claim(s)M5. 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 13. 
Group VII claim(s)i-45. 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 14. 
Group VIII, claim(s)l-45. 65-69, and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes/polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 15. 
Group IX, claim(s)M5, 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 16. 
Group X, claim(s)l-45, 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 17. 
Group XI. claim(s)M5. 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 18. 
Group XII. claim(s)M5. 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 19. 
Group XIII. claim(s)l-45, 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 20. 
Oroup XIV. claim(s)M5. 65-69. and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 21. 
Group XV, claim(s)l-45, 65-69, and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes! polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 22. 
Group XVI, claim(s)l-45. 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 23. 
Group XVII, claim(s)M5, 65-69. and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, poly peptides, expression vectors, and host cells, to the extent that these products read on ORF 24. 
Oroup XVIII, claim(s)M5, 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 25. 
Group XIX, claim(s)M5. 65-69, and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 26. 
Group XX, claim(s)M5. 65-69. and 71-73. drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes! polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 27. 
Group Xxl. claim(s)M5. 65-69. and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 28. 
Group XXII. claim(s)l-45. 65-69. and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 29. 
Group XXIII. claim(s)M5. 65-69, and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
complexes, polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 30. 
Group XXIV claim(s)l-45. 65-69. and 71-73, drawn to isolated nucleic acids, gene clusters, multi-functional protein 
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complexes polypeptides, expression vector,, and host cells, to the extent that these pioducts read on ORF 31. 
Z^tBjMS. IS*, and 71-73. drawn to isolated nucleic acids, gen. ^J^^n^ 
comixes polypeptides, expression vectors, and host cells, to the extent that these products read on ORF 32. 

6J-69. and 71-73. drawn to isolated nucleic acids, gen. caters mult,fc»c^.. protem 
complexes polypeptides, expression vector,, and host cells, to the extent that these product, read on ORF 33 
r ln daJm(s)M5 65^9. and 71-73. drawn to isolated nucleic acids, gen. clusters, mulu-funcuonal proteu, 

SS^SZ^^ vector,, a*d host eel,,, to the extent that these product, read on ORF 34 
oZp «Vffl E(,)U5%5-69. and 71-73. drawn to isolated nucleic acid,, gen. cluster,, mulu-foncaon.. proteu, 
coZlex^VoVptides. expression vectors, and host cell,, to th. .xtent that the,e product, read on ORF 35. 

wmptexeTpolypeptide,. expression vectors, and host cells, to th. extent that th.,. product, read on ORF 36. 

TcUi»(.)MS. 65-69. and 71-73. drawn to isolated nucleic acid,, gen. Custer. mu.U-tuncUon. protem 
cZplex^polypeptid... expression vectors, and host cell,, to th. .xtent that th.se product, read on ORF 37 
Group X)6a cl.im(,)MJ. 65-69. and 71-73. drawn to isolated nucleic acid., gene cluster,, mulu-foncfconal protem 
complexes, polypeptide,, expression vectors, and host cells, to th. extent that these products read on ORF 38. 
2 ra di.)M5, ".69. and 71-73. drawn to i,olated nucleic acid, gen. clusters, multi-fu«tiona pro.cn 
complexes, polypeptides, expression vector,, and host cells, to th. extent that these product, read on ORF 39 
G r»7xX^H. cl»im(,)M5. 65^9. and 71-73. drawn to isolated nucleic acid,, gen. clusters. P roteI » 
complexes, polypeptide,, expression vectors, and host cells, to th. .xtent that these product, read on ORF 40 
oZ XXX&. cTaim(.)l-45. 65-69. and 71-73. drawn to isolated nucleic acids, gen. cluster,. n»«>«»-&»cuon.l protem 
complex.,. polyP«P«i°««. expression vector,, and host cell,, to the extent that these products read on ORF 41. 

Oroup XXXV. claims 46-57, drawn to methods of chemically modifying a biological molecule using a polypeptide 

Oro^XXXvTcllim. 46-57. drawn to methods of chemically modifying a biological molecule using a polypi 

O^XXXVIUUim, 46-57. drawn to methods of chemically modifying a biological molecule using a polVPep"" 

Gro^XXXm claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

cTutlcX^claim. 46-57. drawn to methods of chemically modifying a biological molecule using a polyP«P«i<"« 

Grou'p'xJf cU^» 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 

Group^X!.!.' claim. 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 

Group^XLn. claims 46-57. drawn to method, of chemically modifying a biological molecule using a polypeptide 

GTut e XLni°daims46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

Grout'xLI^cUims 46-57, drawn to methods of chemically modifying a biological molecule using a polypeptide 

Grou d p e XLV°cWm, 7 46-57. drawn to method, of chemically modifying a biological molecule using a polypeptide 

Oro^planclain?. 46-57. drawn to method, of chemically modifying a biological molecule using a polypeptide 

Groutlavn.'daim, 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

Oro^plaVliI^Wnis 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

Gro^p1aix 0 cUim» 46-57, drawn to methods of chemically modifying a biological molecule using a polypeptide 

On^U^JZsi. drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 

Group Udaims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 

Group^u'claim. 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 

Group L"I.' claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 
by ORF 26. 
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Group LIV. claim. 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 
Group^Lv! claim. 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 
Group 1 !. VI.' claim. 46-57. drawn to methods of chemically modifying a biological molecule using . polypeptide encoded 
Group^LVH. claim. 46-57. drawn to method, of chemically modifying a biological molecule using a polypeptide 
ta^m^M 4647. drawn to methods of chemically modifying a biological molecule using a polypeptide 
Grou^DC. ^46-57. drawn to method, of chemically modifying a biological molecule using a polypeptide encoded 
GroujLX claims 46-57. drawn to method, of chemically modifying a biological molecule using a polypeptide encoded 
Group^LXJ.' claim. 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide encoded 

Group LWl'. claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

encoded by ORF 35. , 
Group LXHI, claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

Grout^^^ims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

encoded by ORF 37. . . , 

Group LXV. claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

G^L^^ims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

encoded by ORF 39. . , 

Group LXVII. claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

encoded by ORF 40. , . . 

Group LXVII1. claims 46-57. drawn to methods of chemically modifying a biological molecule using a polypeptide 

encoded by ORF 41. 

Group LX1X. claims 58-61. drawn to methods of coupling a first amino acid to a second amino acid using a polypeptide 

Grouts drawn to methods of coupling a first amino acid to a second amino acid using a polypeptide 

encoded by ORF 17. • • • t *:j 

Group LXXI. claims 58-61. drawn to methods of coupling a first amino acid to a second ammo acid using a polypeptide 

encoded by ORF 21. . t J 

Group LXXII. claims 58-61. drawn to methods of coupling a first amino acid to a second amino acid using a 

polypeptide encoded by ORF 22. 

Group LXX1II. claims 58-61. drawn to methods of coupling a first amino acid to a second amino acid using a 
polypeptide encoded by ORF 23. 

Group LXXTV. claims 58-61. drawn to methods of coupling a first amino acid to a second amino acid using a 
polypeptide encoded by ORF 25. 

Group LXXV. claims 58-61. drawn to methods of coupling a first amino acid to a second ammo acid using a 
polypeptide encoded by ORF 26. t . 

Oroup LXXVI. claims 58-61. drawn to methods of coupling a first amino acid to a second amino acid using a 

polypeptide encoded by ORF 32. . 

Group LXXVII. claims 58-61. drawn to methods of coupling a first amino acid to a second amino acid using a 

polypeptide encoded by ORF 38. 

Group LXXVIII, claims 62-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

encoded by ORF 8. , j • ~u,rw.«tirf« 

Group LXXIX, claims 62-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

Gi^L^^aLs 62-63, drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

Grou^L^I^laTms 62-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

Gt^^XXXII. claims 62-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 
encoded by ORF 12. 
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Coup LXXMII, claim, 62-63. drawn to method, of coupling . fir* fatty acid to a second fatty acid using a polypeptide 
gTu^XXX^ cL, 62-63. drawn to methods of coup.ing . first fatty acid to a second fatty acid using a po.ypeptide 
SEjlxxSF c£rn. 62-63. drawn to method, of coupling a first fatty acid to a second fatty acid using a polypeptide 
oCtS^ "aim, 62-63. to method, of coupling a firs, fatty acid to a second fatty acid using a polypeptide 
G^ptS^/daims 62^3. drawn to methods of coup.ing a first fatty acid to . second fatty acid using a 

SSSSSS^JZXS:^ » — * - * ** »* - id * ■ ~ - fatty acid usin8 * 

is^sst^^u 9 ^ * —* ■ *« ** - to * ^ fetty acid usins s po,ypep,ide 

O n rou d p ed xc!c^. l 6 9 2^3. drawn to methods of coup.ing a first fatty acid to a second fatty acid using a polypeptide 

GrTut'S S^°62-63. drawn to method, of coup.ing a first fatty acid to a ,econd fatty acid using a polypeptide 

Gro^ptSlSms 62-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

Gro^xSlSm, 62-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

Groutl^cWms 62-63. drawn to method, of coupling a first fatty acid to a second fatty acid using a polypeptide 

Gl d plJv°cWm, 4 62-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

SjJSf cfaim, 62-63. drawn to method, of coupling a first fatty acid to a second fatty acid using a po.ypeptide 

Gro^p^W. dafm, 62-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

G^plcwS, 62-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

Gro^plJ^cfaims 62-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

gXI:^^"^. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

Gro?p e a by cSms 62-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

G™u d p e cl b l!'c?a^s 3 6 1 2-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

gTu?™. ^62-63, drawn to method, of coupling a first fatty acid to a second fatty acid using a po.ypeptide 

Grott/ciV. S:«, drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

GX e ^!cUta, 3 22-63. drawn to method, of coupling a first fatty acid to a second fatty acid using a polypeptide 

oX e c#. dSL. 62-63. drawn to method, of coupling a first fatty acid to a second fatty acid using a polypeptide 

Gro^tSSf^S. drawn to method, of coupling a first fatty acid to a second fatty acid using a polypeptide 

Oro^ptwUdaims 62-63. drawn to method, of coupling a first fatty acid to a second fatty acid using a polypeptide 

GXSd^mi 62-63. drawn to methods of coupling a first fatty acid to a second fatty acid using a polypeptide 

0^^^62.63. drawn to methods of coup.ing a first fatty acid to a second fatty acid using a po.ypeptide 

Group X cWm.62.63. drawn to methods of coup.ing a first fatty acid to a second fatty acid using a po.ypeptide 
encoded by ORF 41. 

Group CMI. claim 64. drawn to method, of producing a bleomycin or bleomycin analog u,ing ORF 8^ 
Group CXIII. claim 64. drawn to methods of producing a bleomycin or bleomycn analog u,mg ORF 9. 
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Coup CXIV. claim 64. drawn to method, of producing a bleomycu, or bleomycm^atog usmg 10. 
gZ CXV. cUim 64. drawn to method, of producing a Meomycu, or bleomycin analog ; : OWU. 
oZ claim 64. drawn to method, of producing a bleomycin or bleomycu, analog -g WH 

gZ SvS daim 64 drawn to method, of producing a bleomycin or bleomycu, analog u»ng ORF 13. 
gZI Svm dal 64 d"wn to method, of producing a bleomycin or bleomycu. analog u.ing ; ORF 14. 
IZl cS c^m 64 -rtwn to method, of pacing a bleomycin or bleomycin anafcg u.mg ORF 15. 
oZl CXX. claim 64. drawn to method, of producing a bleomycin or bleomycu, analog uamg ; ORF 16 
oZ CXXL claim 64. drawn to method, of producing a bleomycin or bleomycu, analog u.u,g ; ORF 17 
gZ CXXH claim 64. drawn to method, of producing a bleomycu, or bleomycu, analog u»ng ORF 18. 
gZ SX dSL «. ^wn to method, of producing a bleomycin or bleomycu, analog u.mg ORF 19. 
gZ cSov' cSm 64 dxtwn to method, of producing a bleomycin or bleomycu, analog -lORFM. 
oZl cS^'e La T« i 1 "wn to method, of producing a bleomycin or bleomycin analog usmg ORF 21. 
oZl SSJl claim 64 to™ to method, of producing a bleomycin or bleomycm analog u,mg ORF M. 
oZl cSIrm M. drawn to method, of producing a bleomycin or bleomycm analog «S «F 21. 
oZ CXXVUI lain, 64. drawn to method, of producing a bleomycin or bleomycu, analog u,ing Q ORF 24. 
gZ CXxX clSm 64. drawn to method, of producing a bleomycin or bleomycm analog »,mg ORF 25. 
oZ C^'cWmV drawn to method, of producing a bleomycin or bleomycm analog «a <WM. 
oZ CXX* *L 64. drawn to method, of producing a bleomycin or bleomycm analog u,mg ORF 27 
rZ CXXMI cU,m 64 drawn to method, of producing a bleomycin or bleomycu, analog us.ng OW 28. 
gZ SS Z> « dm" to memod, ofproducing a bleomycin or bleomycin analog using ORF 29. 
IZl SxMV cW» 64 drawn to memod, of producing a bleomycin or bleomycm analog u„og OM 30. 
gZI C^'eWm 64. drawn to memod, ofproducing a bleomycin or bleomycm analog u.tng ORF 31 
gZ CxS IZn 64. drawn to memod, of producing a bleomycin or bleomycm analog u,mg ORF 32. 
gZ SSxVH daim 64. drawn to memod. of producing a bleomycin or bleomycm analog usmg O RF 3 3^ 
gZ C™ U cUim 64. drawn to method, of producing a bleomycin or bleomycu, analog usin , ORF 34. 
gZ cSoOX claim 64. drawn to memod, of producing a bleomycin or bleomycm analog usmg ORF 35. 
gI S£2. 64. drawn to memod, of producing a bleomycin or bleomycm ^gg* 
Group CXL claim 64. drawn to method, of producing a bleomycm or bleomycu, analog »™« ORF 3T 
gZ CXLli *L 64. drawn to memod, of producing a bleomycin or bleomycm analog using ORF 38^ 
gZ CXL claim 64. drawn to method, of producing a bleomycin or bleomycm an. log usmg ORF 39. 
gZ CXUV claim 64. drawn to methods ofproducing a bleomycin or bleomycm analog usuig ORF 40. 
Group CXLV. claim 64. drawn to memod, of producing a bleomycin or bleomycm analog using ORF 41. 

Group CXLV1. claim 70. drawn to memod, of converting an apo-carrier protein to a holo-carrier protein using , 
phosphopantetheinyl transferase encoded by ORF 41. 



The inventions listed as Group. I-CXLV, do not relate to a ^J^i5~ 
under PCT Rule 135. they lack the same or corresponding special technical feature,. I ne Mi-tr 

(page AI.36] , that when „.„ is . special technical relationship among the claimed invenUon, 

^^^-^ • - !* prior ar. The determination is made on the 

contents of the claims as interpreted in light of the descnption and drawmg, (if any). 

The following is the organization of the Groups: 

Supergroup A (Groups l-XXXIV): ^ ■^mAs^.S^^^ -F °»« «* ORFs 8 « 41 * -"~ 
SuXo-P B (Group. XXXV-LXVl.l): method, ^Wt mvlliMnr, ■ MokNI nwkcu.c using any one of 

ORF, 8 through 41 (34 separate groups); . „„H amino acid using any one 

Sunertroup C (Group, LX1X-LXXV11): mftlHf of 9°"PllMT 1 ft* mmt jnfl to a HWI J 

r noE Trf n il 23 25 26 32 or 38 (ORF, disclosed as encoding NRPS»X° «P*™te group,); 
of ORF, 16. 17. 21-2J, 23, 20, >i. or w~ . . secon( j fatty acid using any one of 

Supergroup D (Groups LXXV1H-CX1): methods of equaling * (in! faPY WW " W"W a K ™ 
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ORFs 8 through 41 (34 separate groups); 

Supergroup E (Groups CXII-CXLV): methods of producing a bleomycin or bleomycin anajojt usmg any one of ORFs 
8 through 41 (34 separate groups); and 

Supergroup F (Group CXLVI): methods of c o nverting an apn-currier protein to a holo-camer protein using ORh 41 
(SEQ ID NO:3) (1 group). 



1. The Groups within Supergroup A (Groups I-XXXTV) lack unity of invention for the following reasons: 

The technical feature in Claim 1 is denoted by applicants 1 claim language, namely, "any sag of Blm open 
reading frames (ORFs) 8 through 41" (emphasis added), indicating that each, individual ORF is an invention since any 
ONE open reading frame satisfies the claim. However, this technical feature is not a "special technical feature" within 
the meaning of PCT Rule 13.2 because it fails to distinguish over the prior art for the reasons set forth below. 

At least 2 inventions in claim 1, and possibly more, do not contribute over the prior art upon a cursory, prior 
art search for the purposes of defining the unity of inventions. Sugiyama et al. {Gene I£l (1994) 1 1-16) and Calcutt et 
al. (Gene 151 (1994) 17-21) teach a 14.4 kb plasmid, pMSA-1 (see Sugiymama et al.. page 13. Fig. 2). which contains 

^ _ - - . .. .it r> i i\. a .1 * > mr»«» that *r»«!e»aflr* 




their pMSA-1. Additionally, applicants' own specification positions blml (ORF 10) approximately 4 kb upstream of 
tint A as defined by Sugiyama et al. and Calcutt et al. (see isntant specification, page 45, line 21). Clearly in applicants' 
Fig. 2. blmC (1.5 kb) and ORF 8 (1.28 kb) are downstream (closer to btmA) of blml indicating that these ORFs are 
within 1.22 kb of blmA. and at least 3.8 kb upstream of btmA is taught in the pMSA-1 plasmid which would encompass 
all of ORF 8 and most of btmC. 

Each technical feature in claim 1, i.e. ORFs 8 through 41, encodes a unique polypeptide with a unique 
function, with respect to the other ORFs. as supported by applicants' specification in Tables I and II which notes the 
distinct enzymatic activities of the disclosed ORFs. Thus, ORFs 8 through 41 lack not only the same, but also a 
corresponding special technical feature. 

The 34 inventions in Claim 1, as defined by the 34 ORFs, are 34 isolated nucleic acids which, when 
considered as a whole, do qoJ contribute a common special technical feature over the prior art These 34 inventions are 
merely products, namely nucleic acids, which share the basic chemical construction of nucleic acids (deoxyribose sugar 
and phosphate backbone with one of five nucleoside bases attached). While these inventions may share utility, when 
coupled all together, for the production of bleomycin, that utility cannot be considered a special technical feature since it 
is neither expressly claimed nor clearly identified in Claim 1. 

2. The Groups within Supergroup B (Groups XXXV-LXVIII) lack unity of invention for the following reasons: 

The methods of Supergroup B lack unity because said methods use wholly different reagents (different 
polypeptides encoding by different ORFs) to produce wholly different products (bleomycin analogs). In particular, the 
use of any one ORF in the methods of Supergroup B renders a distinct product because polypeptides encoded by each of 
ORFs 8 through 41, as represented in Groups XXXV-LXVIII, have different and distinct functions (see instant 
specification Table I and II). 

3. The Groups within Supergroup C (Groups LXIX-LXXVH) lack unity of invention for reasons analgous to 
those stated in section 2 above pertaining to Supergroup B. 

4. The Groups within Supergroup D (Groups LXXVIII-CXI) lack unity of invention for reasons analgous to 
those stated in section 2 above pertaining to Supergroup B. 

5. The Groups within Supergroup E (Groups CXII-CXLV) lack unity of invention for reasons analgous to those 
stated in section 2 above pertaining to Supergroup B. 

6. Supergroup F contains only one group, Group CXLVI; however, it is named a Supergroup for consistency. 
No lack of unity is found within Supergroup F. 

7. Each member of Supergroup A lacks unity of invention with each member of Supergroups B-F for the 
following reasons: 



Form PCT/ISA/210 (extra sheet) (July 1998)* 



INTERNATIONAL SEARCH REPORT 



International application No. 
PCT/USOO/00445 



The technical feature of each ORF in claim I is a nucleic acid which can encode . polypepUde used w one of 
Coups xJcXVO^Vl. However, the methods can be practiced in the absence of the .solated nucle.c acd, and » fact, 
are pr^Sn bSmycin-producing bacteria. Thus. the nucleic acids and the method, of us.ng the encoded proteut, 

lack unity of invention. 

8 Each member of Supergroups B-F lacks unity of invention with every other member because each Oroup 

within SupeTro-P. B-F produce a wholly different and distinct product These product, can be bleomycin analogs 
dWuo« 2-cLponcnt fatty acids, and numerous permutations of tri. tetrn. eU, of sa.d products. Thus. a., 
these method groups lack unity of invention with each other. 
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